I have included the prompts here to provide context to my code and responses. I have also included library() commands whenever applicable, even if not the first time the library is being called, to clearly outline what I am doing here and what I need. I don’t and won’t do this in the real world because it is a waste of space.
I took a stab at all of the Impress Yourself’s. I didn’t quite get them all, but learned a lot along the way.
Lastly, I submitted an initial version of this Midterm Exam at 4:59p to meet the deadline. That submission included all required components. I felt like cleaning things up more, working on more of the Impress Yourself’s, and giving better answers to the Meta Questions. This is my submission that contains that. Since this is coming in after the submission deadline, I do not expect any of the updates to be considered in grading. I did this for my own learning and not for a better grade.
Cheers!
Each of you have a study system you work in and a question of interest. Give an example of one variable that you would sample in order to get a sense of its variation in nature. Describe, in detail, how you would sample for the population of that variable in order to understand its distribution. Questions to consider include, but are not limited to: Just what is your sample versus your population? What would your sampling design be? Why would you design it that particular way? What are potential confounding influences of both sampling technique and sample design that you need to be careful to avoid? What statistical distribution might the variable take, and why?
I am currently working to characterize the interactome of the transcriptional repressor Capicua (CIC) by identifying CIC interactors. Once identified and validated, I want to determine whether they are involved in regulating CIC function. I plan to conduct a series of overexpression and RNAi experiments, in which I will either overexpress or knockdown the interactor and CIC, simultaneously, in Drosophila. I will then observe the development of the wing, where CIC is known to regulate size, number of veins, and vein arborization. For example, CIC overexpression leads to a wing smaller, with fewer and shorter veins, while knockdown or knockout leads to a larger wing with more and longer veins. When compared to these baseline phenotypes, and deviation in wing size, vein length, and vein arborization, can be attributed to the overexpression or knockdown of the interactor. The role of the interactor in the CIC pathway can then be inferred.
In this experiment, I aim to draw conclusions about the function of CIC in Drosophila melanogaster (population) from a set of flies (sample) that I have overexpressed or knocked down CIC in. There are several potential confounding variables in this experiment, including temperature, age, sex, and the variability of the overexpression or RNAi effect across individuals. Foremost, I will conduct all genetic crosses and rearing at 25^o C to remove the potential confound of temperature on protein expression and organism development. This will also cause all progeny to mature at the same rate, so I will collect progeny 12 hours after eclosion to ensure collection of only mature individuals of the same age. I will collect data from males and females separately. I will not be able to control the exact protein levels induced by overexpression or knockdown on the individual level, but I will use the same UAS/Gal4 system to drive protein overexpression or knockout for both CIC and the interactor, across experiments. I will quantify protein expression, post-data collection, via Western Blot and image analysis. The level of protein will be compared to controls to determine overall levels of overexpression or knockdown. However, I will not be able to do this on the individual level as not enough protein would be present for detection via Western Blot. I will therefore need to conduct this experiment across the entire group that was sampled.
The data points will be continuous because my independent variables have an infinite number of potential outcomes. Due to the aforementioned variability in the amount of protein present after overexpression or knockdown, I expect the data to take on a Gaussian (normal, continuous) distribution.
Johns Hopkins has been maintaining one of the best Covid-19 timseries data sets out there. The data on the US can be found here with information about what is in the data at https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data
Before proceeding, I would like to establish a theme for my plots through this midterm. My design goals are to make the plots simple, clear, and well-spaced.
custom_theme <- function(){
font <- "Helvetica" # font selection
theme_minimal() %+replace% # theme based on minimal with following replacements
theme(
panel.grid.major = element_blank(), # leave grids and axis ticks blank
panel.grid.minor = element_blank(),
axis.ticks = element_blank(),
axis.line = element_line(color = "black",
size = 1),
panel.border = element_rect(color = "black",
fill=NA,
size=1),
plot.title = element_text(family = font,
size = 20,
face = 'bold',
hjust = 0.5, # move title to center horizontally
vjust = 2), # move title up a wee bit
plot.subtitle = element_text(
family = font,
size = 15,
hjust = 0.5),
plot.caption = element_text(
family = font,
size = 10,
hjust = 1), # put caption in right corner
axis.title = element_text(
family = font,
face = 'italic',
size = 15),
axis.text = element_text(
family = font,
size = 10),
axis.text.x = element_text(
margin = margin(t = 2, # top
r = 2, # right
b = 2, # bottom
l = 2)) # left
)
}
Download and read in the data. Can you do this without downloading, but read directly from the archive?
library(readr)
covid <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv")
str(covid)
summary(covid)
setwd("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Data/daily incidence files/")
library(data.table) # make data tables
library(dplyr) # pipes
library(tidyr) # allows loading of purrr
library(purrr) # map
covid_incidents <- list.files(path = "/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Data/daily incidence files/",
pattern = "*.csv") %>%
map_df(~fread(.)) %>%
as.data.frame()
The data is, well, huge. It’s also wide, with dates as columns. Write a function that, given a state name as its input argument, will output a time series (long data where every row is a day) with columns for date, new daily cases and for cumulative cases in that state.
Note, let’s make the date column that emerges a true date object. Let’s say you’ve called it date_col. If you mutate it, mutate(date_col = lubridate::mdy(date_col)), it will be turned into a date object that will have a recognized order. {lubridate} is da bomb, and I’m hoping we have some time to cover it in the future.
Even better - impress yourself (if you want - not required!) and merge it with some other data source to also return cases per 100,000 people.
library(lubridate) # date stuff
library(dplyr) # data manip
library(tidyr) # pivot
table_function <- function(Province_Input) {
covid_state <- covid %>%
filter(Province_State %in% Province_Input) %>%
select(-c(UID, iso2, iso3, code3, FIPS, Admin2, Country_Region, Lat, Long_, Combined_Key)) %>%
pivot_longer(cols = -Province_State,
names_to = "date_col",
values_to = "cumulative_cases") %>%
mutate(date_col = gsub("\\X", "", date_col)) %>%
mutate(date_col = lubridate::mdy(date_col)) %>%
dplyr::group_by(date_col) %>%
dplyr::summarise(cumulative_cases = sum(cumulative_cases)) %>%
mutate(new_cases = cumulative_cases - lag(cumulative_cases, default = first(cumulative_cases)))
date_appendage <- data_frame(date_col = seq(as.Date("2020-01-22"),
as.Date("2020-04-11"),
by = "days"),
incident_rate = NA) # create df with dates that are absent from covid_incidents data file
state_incidents <- covid_incidents %>%
dplyr::filter(Province_State %in% Province_Input) %>%
select(Date, Incident_Rate) %>%
arrange(Date) %>%
rename(date_col = Date,
incident_rate = Incident_Rate)
state_incidents_full <- merge(date_appendage,
state_incidents,
all = TRUE) # merge the placeholder date rows with incidents df
state_covid_full <- merge(covid_state,
state_incidents_full,
all = TRUE) %>% # merge incidents df with main df
drop_na() # remove NAs before plotting later
return(state_covid_full) }
MA_covid <- table_function("Massachusetts")
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
Great! Make a compelling plot of the timeseries for Massachusetts! Points for style, class, ease of understanding major trends, etc. Note, 10/10 for yourself only for the most killer figures. Don’t phone it in! Also, note what the data from JHU is. Do you want the cummulatives, or daily, or what? Want to highlight features? Events? Go wild!
library(ggplot2) # plot
library(gganimate) # animate
library(ggpubr) # other features
library(magick) # combined animated plots
library(gifski) # render
library(png) # png files for output
MA_plot_colors <- c("New Cases" = "salmon",
"Incident Rate" = "blue")
MA_plot <- ggplot(data = MA_covid) +
geom_line(aes(x = date_col,
y = new_cases),
alpha = 1,
linetype = 1) +
geom_line(aes(x = date_col,
y = incident_rate),
alpha = 1,
linetype = 5) +
labs(title = "Covid Cases in Massachusetts",
subtitle = "From Jan 2020 to Present",
x = "Date",
y = "Number of Cases") +
custom_theme() +
transition_reveal(along = date_col)
MA_gif <- animate(MA_plot,
fps = 100,
duration = 1,
width = 500,
height = 500,
renderer = gifski_renderer("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Animations/new.gif")) # creates output
MA_mgif <- image_read(MA_gif) # reads it
MA_mgif
# Attempted code to combined 3 separate animated plots
# MA_cumulative_cases_plot <- ggplot(data = MA_covid) +
# geom_line(aes(x = date_col,
# y = cumulative_cases),
# alpha = 1,
# linetype = 1,
# color = "blue") +
# labs(title = "Cumulative Cases",
# x = "Date",
# y = "Number of Cases") +
# transition_reveal(along = date_col) +
# custom_theme()
#
# MA_incident_rate_plot <- ggplot(data = MA_covid) +
# geom_line(aes(x = date_col,
# y = incident_rate),
# alpha = 1,
# linetype = 5,
# color = "magenta") +
# labs(title = "Number of Cases per 100,000 People",
# x = "Date",
# y = "Rate (per 100,000)") +
# transition_reveal(along = date_col) +
# custom_theme()
#
# cumulative_gif <- animate(MA_cumulative_cases_plot,
# fps = 100,
# duration = 1,
# width = 500,
# height = 500,
# renderer = gifski_renderer("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Markdown/cumulative.gif"))
#
# incidents_gif <- animate(MA_incident_rate_plot,
# fps = 100,
# duration = 1,
# width = 500,
# height = 500,
# renderer = gifski_renderer("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Markdown/incidents.gif"))
# cumulative_mgif <- image_read(cumulative_gif)
# cumulative_mgif
# incidents_mgif <- image_read(incidents_gif)
#
# combined_gif <- image_append(c(new_mgif[1],
# cumulative_mgif[1],
# incidents_mgif[1]),
# stack = TRUE)
#
# for(i in 2:100){
# final <- image_append(c(new_mgif[1],
# cumulative_mgif[1],
# incidents_mgif[1]),
# stack = TRUE)
# combined_gif <- c(combined_gif, final)
# }
I opted to design an animated plot (Figure 1) demonstrating both the new cases and the incident rate across time. This dynamically demonstrates outbreaks and lulls in covid transmission across the state, while also showing how the total number of cases per 100,000 increases over time. I encountered issues getting a legend on my animated gif, and gave up because I wanted to prioritize the other tasks. I’m sure it’s an easy-ish fix. For now, you’ll have to trust me that the dashed line is the Incident Rate and the solid line is New Cases.
Lastly, I coded 3 gif plots that animated similarly to how I animated my final plot. One plot was for new cases, one for cumulative, one for incident rate. When I tried to combine them (see commented code), they combined into one figure (side-by-side) but didn’t animate. Again, I moved on because the plot I decided to use was pretty nifty. I would like to circle back and figure this out in the future though. This could be really cool to have in my tool belt.
Cool. Now, write a function that will take what you did above, and create a plot for any state - so, I enter Alaska and I get the plot for Alaska! If you really want to impress yourself (not required) highlight points of interest - but dynamically using the data.
library(gghighlight) # highlighting points in ggplots
gif_function <- function(Province_Input) {
state_table <- table_function(Province_Input)
province_mean_new_cases <- mean(state_table$new_cases)
province_mean_incidents <- mean(state_table$incident_rate)
state_plot <- ggplot(data = state_table) +
geom_line(aes(x = date_col,
y = new_cases),
alpha = 1,
linetype = 1,
color = "salmon") +
geom_line(aes(x = date_col,
y = incident_rate),
alpha = 1,
linetype = 5,
color = "black") +
labs(title = "Covid Cases in {Province_Input}",
subtitle = "From Jan 2020 to Present",
x = "Date",
y = "Number of Cases") +
custom_theme() +
transition_reveal(along = date_col) +
gghighlight(new_cases >= province_mean_new_cases & incident_rate >= province_mean_incidents)
state_gif <- animate(state_plot,
fps = 100,
duration = 1,
width = 500,
height = 500,
renderer = gifski_renderer("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Animations/new.gif")) # creates output
state_mgif <- image_read(state_gif) # reads it
return(state_mgif) }
gif_function("New Hampshire") %>%
print()
## # A tibble: 100 × 7
## format width height colorspace matte filesize density
## <chr> <int> <int> <chr> <lgl> <int> <chr>
## 1 GIF 500 500 sRGB TRUE 0 72x72
## 2 GIF 500 500 sRGB TRUE 0 72x72
## 3 GIF 500 500 sRGB TRUE 0 72x72
## 4 GIF 500 500 sRGB TRUE 0 72x72
## 5 GIF 500 500 sRGB TRUE 0 72x72
## 6 GIF 500 500 sRGB TRUE 0 72x72
## 7 GIF 500 500 sRGB TRUE 0 72x72
## 8 GIF 500 500 sRGB TRUE 0 72x72
## 9 GIF 500 500 sRGB TRUE 0 72x72
## 10 GIF 500 500 sRGB TRUE 0 72x72
## # … with 90 more rows
Here I tried to highlight outbreaks by using gghighlight() to change the color of any point above the mean new_cases or incident_rate. Highlighting this for the incident_rate doesn’t tell us much other than that the data is past the halfway point. I am having a hard time interpreting the highlight for new_cases - it appears that a new line is formed above the existing line. This was not intended. Additionally, adding the highlight removed my existing line colors - which would be completely fine if I could figure out how to create a legend because I could include the line type as the key to the legend. While this isn’t quite the final product I was hoping for, I’m glad I took a stab at this because I learned that gghighlight exists and will play around with it more in the future.
I also am working on adding figure descriptions to animated ggplots - I can do this quite well for non-animated ggplots (see later code in Question 3, 4, and 5).
This is Figure 2 FYI (and the last one was Figure 1) - so don’t be alarmed when my next figure is called Figure 3 without previous figures being labeled as such in the added descriptions.
Let’s fit and evaluate a linear model! To motivate this, we’ll look at Burness et al.’s 2012 study “Post-hatch heat warms adult beaks: irreversible physiological plasticity in Japanese quail http://rspb.royalsocietypublishing.org/content/280/1767/20131436.short the data for which they have made available at Data Dryad at http://datadryad.org/resource/doi:10.5061/dryad.gs661. We’ll be doing a slightly modified version of their analysis.
Starting with loading this (new to you) data set for the very first time, what are the steps that you would take to analyze this data? Write them out! Maybe even in comments!
First, I will read the README files associated with this data to understand what is being measured, what the variables are, and any notes the developers feel are important for me to know. Next, I’ll load the data and review its format, the classes of the variables, etc. to know if I need to make and manipulations for more efficient processing.
I’ll next plot variables of interest to get a general idea of the relationship between them. Following this, I’ll create a linear model using the variables of interest and evaluate the assumptions of the model. I’ll make any necessary modifications to the model and reevaluate assumptions. Once I am comfortable with the mode, I’ll evaluate the model by determining the effects of my independent variable(s) on my dependent variable, the correlation between them, etc. I’ll then plot the model to demonstrate these findings visually.
Load the data. Look at it. Anything interesting? Anything you’d want to watch out for? Or is it OK? Anything you’d change to make working with the data easier? Also, manipulate it a bit - for our analysis, we’re going to want Bird # and the Age of the birds to be categorical variables. Make sure we have variables that will work!
Birds #16 and #18 had deformities to the bill. Is it possible that these deformities impacted some variables of interest? I should consider removing these from the dataset to avoid confounds.
quail <- read.csv("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Data/doi_10.5061_dryad.gs661__v1/Morphology data.csv")
str(quail) # review of different vars and their clases
## 'data.frame': 880 obs. of 10 variables:
## $ Bird.. : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sex : chr "" "m" "f" "m" ...
## $ Age..days. : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Exp..Temp...degree.C.: int 15 15 15 15 15 15 15 15 15 15 ...
## $ Mass..g. : num 16.1 19.2 17.5 14.4 17.4 ...
## $ Tarsus..mm. : num 19.4 20.4 19 20.1 21.8 ...
## $ Culmen..mm. : num 7.64 7.49 7.31 7.34 8.24 6.82 7.84 7.39 7.4 7.81 ...
## $ Depth..mm. : num 4.23 4.46 3.92 3.85 4.42 3.65 3.94 3.72 4.5 4.09 ...
## $ Width..mm. : num 4.49 4.44 4.01 4.22 4.56 3.73 4.6 4.66 3.83 3.89 ...
## $ NOTES : chr "" "" "" "" ...
summary(quail) # provides some summary stats of the vars
## Bird.. Sex Age..days. Exp..Temp...degree.C.
## Min. : 1.00 Length:880 Min. : 5 Min. :15.0
## 1st Qu.:10.75 Class :character 1st Qu.: 15 1st Qu.:15.0
## Median :20.50 Mode :character Median : 26 Median :22.5
## Mean :20.50 Mean : 37 Mean :22.5
## 3rd Qu.:30.25 3rd Qu.: 49 3rd Qu.:30.0
## Max. :40.00 Max. :123 Max. :30.0
##
## Mass..g. Tarsus..mm. Culmen..mm. Depth..mm.
## Min. : 12.51 Min. :16.19 Min. : 4.23 Min. :3.300
## 1st Qu.: 50.06 1st Qu.:26.19 1st Qu.: 9.31 1st Qu.:5.058
## Median :127.58 Median :33.81 Median :11.98 Median :5.600
## Mean :138.75 Mean :31.73 Mean :11.73 Mean :5.593
## 3rd Qu.:219.59 3rd Qu.:37.17 3rd Qu.:14.03 3rd Qu.:6.110
## Max. :375.00 Max. :49.42 Max. :18.14 Max. :9.520
## NA's :76 NA's :112 NA's :114 NA's :112
## Width..mm. NOTES
## Min. :3.290 Length:880
## 1st Qu.:4.447 Class :character
## Median :4.840 Mode :character
## Mean :4.823
## 3rd Qu.:5.170
## Max. :6.920
## NA's :112
There are many NAs. Additionally, the variable names contain many periods and may be difficult to work with - I should rename those. I’ll also change Bird# and Age variables to categorical for future analyses.
library(tidyr) # tidy capabilities
library(dplyr) # data manipulation
quail <- quail %>%
rename(bird_num = Bird.., age_days = Age..days., temp_celsius = Exp..Temp...degree.C.,
mass_g = Mass..g., tarsus_mm = Tarsus..mm., culmen_mm = Culmen..mm.,
depth_mm = Depth..mm., width_mm = Width..mm.) %>% # rename variables
mutate(age_days = as.factor(age_days),
bird_num = as.factor(bird_num)) %>% # change vars to factors
subset(bird_num != 16) %>% # remove bird with reported deformities
subset(bird_num != 18) %>% # remove bird with reported deformities
drop_na() # remove NAs
The model we will fit is one where we want to look at how temperature treatment affects the development of tarsus length over time (age). Visualize this. Make it look good! Is there anything here that would shape how you fit a model? Do you see why we are treating age as categorical?
library(ggplot2) # plotting
quail_plot <- ggplot(data = quail,
mapping = aes(x = age_days,
y = tarsus_mm,
color = temp_celsius)) +
geom_point() +
facet_wrap(vars(temp_celsius)) + # split the data into plots for each temp
labs(title = "Influence of Temperature",
subtitle = "on Tarsus Length Over Time",
x = "Age (days)",
y = "Tarsus Length (mm)",
color = "Temperature (C)") +
custom_theme() +
theme(axis.text.x= element_text(size = 6))
quail_plot_text <- paste("Figure 3: The influence of temperature on tarsus length was plotted",
"across time. Data was obtained from Burness et al. (2013)",
"https://doi.org/10.5061/dryad.gs661",
sep = " ") # write the figure description
quail_plot_text_p <- ggparagraph (text = quail_plot_text,
size = 10,
color = "black") # format description
quail_plot_with_text <- ggarrange(quail_plot,
quail_plot_text_p,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
quail_plot_with_text
Tarsus length increases aver time then eventually levels off. This suggests that I could log-transform in a model. Age is measured in specific days and isn’t necessarily a clean, continuous variable - especially with all the variability in when measurements increments (e.g., the jump from sampling at 80 to 123 days). Making age categorical removes issues I’d run into work with this.
Fit a model where tarsus length is predicted by age, treatment, their interaction, and a ‘block’ effect, as it were, of bird #. Evaluate the fit. Make any modifications as you see necessary due to your model checks. Note, it’s not going to be perfect (I checked the original - hey, you can too - and they’re on the edge) - but our models are robust, so we’re OK.
quail_lm <- lm(log(tarsus_mm) ~ age_days*temp_celsius + bird_num,
data = quail)
library(performance) # model assessment capabilities
check_model(quail_lm)
The model-predicted data matches the observed data very well. Jarrett has mentioned that this is often enough to conclude that the model is good-to-go. But I’m a good scientist so I’ll look through the other assumption checks before proceeding.
Quickly scanning the assumption checks - there are no apparent concerns with linearity, homogeneity of variance, or influential observations. The normality of residuals looks pretty good…the points on either end of the distribution quantiles stray a bit from the line - but not enough to be concerned. However, the collinearity is very high for all IVs. I would like to use the cor() function to look at the correlations closer, but two of my three IVs are not numeric (age and bird number). I also can’t do much centering - age is discrete and birds were only tested at one of two temperature so centering wouldn’t help much. Therefore, it seems my best option is to move forward with the model I currently have.
A central question of this paper is - does temperature affect the development of tarsus length. With your fit model in hand, answer the question, however you deem fit. Make sure that, alongside any numerical answers you produce, there is at least one stunning figure for a publishable paper!
library(broom) # some good summary functions
library(emmeans) # cool & easy statistics
r2(quail_lm)
## # R2 for Linear Regression
## R2: 0.946
## adj. R2: 0.939
emmeans(quail_lm,
specs = ~temp_celsius) %>%
contrast(method = "pairwise") %>%
confint()
## contrast estimate SE df lower.CL upper.CL
## temp_celsius15 - temp_celsius30 -0.0356 0.00552 649 -0.0465 -0.0248
##
## Results are averaged over the levels of: age_days, bird_num
## Results are given on the log (not the response) scale.
## Confidence level used: 0.95
The R2 value for my model is 0.949, suggesting that the independent variables included (temperature, age, and bird number) are likely responsible for a large portion of the fate of the dependent variable (tarsus length). Pairwise comparison of the effect of temperature on tarsus length indicates that a temperature of 30 degrees Celsius yields a slightly higher mean tarsus length than a temperature of 15 degrees Celsius. Thus, I can conclude from this analysis that temperature is positively correlated to tarsus length. This is somewhat expected, as temperature is known to increase developmental rates in organisms ranging from yeast to Drosophila.
library(visreg) # for visualizing model
library(ggplot2) # for plotting features
library(ggpubr) # for combining plotting objects
quail_lm_visreg <- visreg(quail_lm,
"temp_celsius",
gg = TRUE) +
custom_theme() +
labs(title = "Influence of Temperature on Tarsus Length",
x = "Temperature (C)",
y = "log(Tarsus Length)") # plot with labels
## Warning: Note that you are attempting to plot a 'main effect' in a model that contains an
## interaction. This is potentially misleading; you may wish to consider using the 'by'
## argument.
## Conditions used in construction of plot
## age_days: 5
## bird_num: 2
quail_lm_visreg_text <- paste("Figure 4: The influence of temperature, age, and bird number",
"on tarsus length was modeled and log transformed. Tarsus length",
"was then plotted by temperature.",
sep = " ") # write the figure description
quail_lm_visreg_text_p <- ggparagraph(text = quail_lm_visreg_text,
size = 10,
color = "black") # format description
quail_lm_visreg_with_text <- ggarrange(quail_lm_visreg,
quail_lm_visreg_text_p,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
quail_lm_visreg_with_text
Plotting the log of tarsus length by temperature indicates that temperature increases slgihtly at 30 degrees C compared to 15 degrees C. This aligns with the aforementioned conclusions from the emmeans() analysis.
library(visreg) # for visualizing model
library(ggplot2) # for plotting features
library(ggpubr) # for combining plotting objects
quail_lm_visreg2d <- visreg2d(quail_lm,
x = "age_days",
y = "temp_celsius", # create 2d visreg
plot.type = "gg" , # make it work with gg
nn = 50) + # make the gradient smoother
custom_theme() +
labs(title = "Influence of Temperature",
subtitle = " on Tarsus Length Over Time",
x = "Age (days)",
y = "Temperature (C)")
## Warning in predict.lm(fit, newdata = DD): prediction from a rank-deficient fit
## may be misleading
quail_lm_visreg2d_text <- paste("Figure 5: Tarsus length at various temperatures",
"and ages was plotted. Red denotes high tarsus",
"length while blue denotes low tarsus length",
sep = " ") # write the figure description
quail_lm_visreg2d_text_p <- ggparagraph(text = quail_lm_visreg2d_text,
size = 10,
color = "black") # format description
quail_lm_visreg2d_with_text <- ggarrange(quail_lm_visreg2d,
quail_lm_visreg2d_text_p,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
quail_lm_visreg2d_with_text
Plotting tarsus length across both temperature and age supports the conclusion from Figure 4 that tarsus length increases at higher temperatures. This is clear between ages 19-23, where longer tarsus lengths (darker red) are observed at 30^o Celsius and shorter tarsus lengths (lighter red) are observed at 15^o Celsius. This figure also indicates that tarsus length increases with age at both temperatures, as shorter tarsus lengths (blue) are seen between ages ~5-15 while longer tarsus lengths are seen at ages greater than 17.
Sometimes we bootstrap when things don’t quite meet parametric assumptions. In the above data, let’s look at a relationship and then bootstrap it’s coefficients.
Look at the relationship between culmen and tarsus length using a linear model. What’s off here? Don’t worry about any other predictors for culmen length other than tarsus length.
quail <- read.csv("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Data/doi_10.5061_dryad.gs661__v1/Morphology data.csv")
str(quail)
## 'data.frame': 880 obs. of 10 variables:
## $ Bird.. : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sex : chr "" "m" "f" "m" ...
## $ Age..days. : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Exp..Temp...degree.C.: int 15 15 15 15 15 15 15 15 15 15 ...
## $ Mass..g. : num 16.1 19.2 17.5 14.4 17.4 ...
## $ Tarsus..mm. : num 19.4 20.4 19 20.1 21.8 ...
## $ Culmen..mm. : num 7.64 7.49 7.31 7.34 8.24 6.82 7.84 7.39 7.4 7.81 ...
## $ Depth..mm. : num 4.23 4.46 3.92 3.85 4.42 3.65 3.94 3.72 4.5 4.09 ...
## $ Width..mm. : num 4.49 4.44 4.01 4.22 4.56 3.73 4.6 4.66 3.83 3.89 ...
## $ NOTES : chr "" "" "" "" ...
summary(quail)
## Bird.. Sex Age..days. Exp..Temp...degree.C.
## Min. : 1.00 Length:880 Min. : 5 Min. :15.0
## 1st Qu.:10.75 Class :character 1st Qu.: 15 1st Qu.:15.0
## Median :20.50 Mode :character Median : 26 Median :22.5
## Mean :20.50 Mean : 37 Mean :22.5
## 3rd Qu.:30.25 3rd Qu.: 49 3rd Qu.:30.0
## Max. :40.00 Max. :123 Max. :30.0
##
## Mass..g. Tarsus..mm. Culmen..mm. Depth..mm.
## Min. : 12.51 Min. :16.19 Min. : 4.23 Min. :3.300
## 1st Qu.: 50.06 1st Qu.:26.19 1st Qu.: 9.31 1st Qu.:5.058
## Median :127.58 Median :33.81 Median :11.98 Median :5.600
## Mean :138.75 Mean :31.73 Mean :11.73 Mean :5.593
## 3rd Qu.:219.59 3rd Qu.:37.17 3rd Qu.:14.03 3rd Qu.:6.110
## Max. :375.00 Max. :49.42 Max. :18.14 Max. :9.520
## NA's :76 NA's :112 NA's :114 NA's :112
## Width..mm. NOTES
## Min. :3.290 Length:880
## 1st Qu.:4.447 Class :character
## Median :4.840 Mode :character
## Mean :4.823
## 3rd Qu.:5.170
## Max. :6.920
## NA's :112
quail <- quail %>%
rename(bird_num = Bird.., age_days = Age..days., temp_celsius = Exp..Temp...degree.C.,
mass_g = Mass..g., tarsus_mm = Tarsus..mm., culmen_mm = Culmen..mm.,
depth_mm = Depth..mm., width_mm = Width..mm.) %>%
mutate(age_days = as.factor(age_days),
bird_num = as.factor(bird_num)) %>%
subset(bird_num != 16) %>% # remove bird with reported deformities
subset(bird_num != 18) %>% # remove bird with reported deformities
drop_na()
quail_culmen_lm <- lm(culmen_mm ~ tarsus_mm,
data = quail)
library(performance) # model assessment capabilities
check_model(quail_culmen_lm)
The reference lines for the linearity and HoV plots are a bit wavy - whereas they should be flat. They don’t look THAT bad (I know, a bit subjective), and because the PPC is really good I think this model looks good overall.
Write a function that, given a formula and a data set, will return the coefficients of a linear model fit to that data set as a tibble. Just the coefficients. Nothing else. Show that it works with the whole dataset to recover the coefficients from above.
coefficient_function <- function(formula, dataset) {
function_lm <- lm(formula,
data = dataset)
coef_tibble <- as_tibble(coef(function_lm))
return(coef_tibble)
}
coefficient_function_test <- coefficient_function(formula = culmen_mm ~ tarsus_mm,
dataset = quail)
quail_dataset_coef <- as_tibble(coef(quail_culmen_lm))
coefficient_function_test_coef <- coefficient_function(culmen_mm ~ tarsus_mm, quail)
quail_dataset_coef
## # A tibble: 2 × 1
## value
## <dbl>
## 1 -0.104
## 2 0.373
coefficient_function_test_coef
## # A tibble: 2 × 1
## value
## <dbl>
## 1 -0.104
## 2 0.373
Write a function that takes a dataset as input, then outputs a bootstrap sample of that data - it should have the same number of rows, and be sampled with replacement. Show that, if you make one bootstrapped replicate of your dataset and you fit a model using your function from above, your coefficients are a wee bit different. If you want to IMPRESS YOURSELF sample so that you get back the same number of rows of data by temperature and age (this is called stratified bootstrapping). Note - look at sample() or dplyr::sample_frac().
library(bootstrap)
##
## Attaching package: 'bootstrap'
## The following object is masked from 'package:broom':
##
## bootstrap
bootstrap_function <- function(dataset) {
bootstrap_sample <- sample_frac(dataset,
size = 1,
replace = TRUE)
return(bootstrap_sample)
} # create function
bootstrap_df <- bootstrap_function(quail) # check that it works
bootstrap_test_lm <- lm(culmen_mm ~ tarsus_mm,
data = bootstrap_df) # create a matching model from it
coef(quail_culmen_lm) # generate coefs
## (Intercept) tarsus_mm
## -0.1042465 0.3725280
coef(bootstrap_test_lm) # compare coefs to original model
## (Intercept) tarsus_mm
## -0.05443067 0.36988998
Here’s a stab at stratifying by temperature and age:
# library(splitstackshape) # has a cool stratify function
#
# stratify_function <- function(dataset, stratifier_a, stratifier_b) {
#
# stratified_sample <- stratified(indt = dataset,
# group = c($stratifier_a, $stratifier_b),
# size = c(lendth($stratifier_a), length($stratifier_b)))
# return(stratified_sample)
# }
#
# stratify_function(quail, age_days)
IDK I’m tired. Ignore this one.
Write a function that, given a formula and a data set, will return one bootstrapped set of coefficients using the functions from above. Again, show it works.
bootstrap_function_combo <- function(formula, dataset) {
bootstrapped_data <- bootstrap_function(dataset)
combo_output <- coefficient_function(formula, bootstrapped_data)
return(combo_output)
}
bootstrap_function_combo(culmen_mm ~ tarsus_mm, quail)
## # A tibble: 2 × 1
## value
## <dbl>
## 1 -0.0608
## 2 0.373
coef(quail_culmen_lm)
## (Intercept) tarsus_mm
## -0.1042465 0.3725280
OK! Now, using these functions and some of the iteration tools you know, get 1,000 bootstrapped replicate coefficients, and then report out the bootstrapped coefficient values and their SE. How does it compare to the results from your original lm?
bootstrap_rep_test <- replicate(n = 1000,
rowMeans(bootstrap_function_combo(culmen_mm ~ tarsus_mm, quail)))
se_bootstrap_rep <- sd(bootstrap_rep_test)
bootstrap_rep_test
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.4161543 -0.1526042 -0.1640434 -0.5264739 0.2590518 0.03601818
## [2,] 0.3843974 0.3744300 0.3736359 0.3858544 0.3591996 0.36879031
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] -0.3082549 -0.3712871 -0.1613291 0.05293866 0.01962066 -0.1663441
## [2,] 0.3809130 0.3834267 0.3750556 0.36820252 0.36844073 0.3743344
## [,13] [,14] [,15] [,16] [,17] [,18]
## [1,] -0.1346899 0.2848961 -0.1796633 -0.1606071 -0.3157916 -0.1357093
## [2,] 0.3686215 0.3596903 0.3752197 0.3731305 0.3780949 0.3695803
## [,19] [,20] [,21] [,22] [,23] [,24]
## [1,] -0.3950814 -0.1857207 -0.1893422 -0.01394575 0.4234347 0.2189330
## [2,] 0.3818572 0.3761776 0.3745477 0.37146381 0.3550579 0.3637326
## [,25] [,26] [,27] [,28] [,29] [,30]
## [1,] -0.07092649 -0.1831385 -0.0764131 0.001680734 -0.2371758 -0.3897175
## [2,] 0.37465787 0.3744436 0.3727468 0.366943514 0.3748997 0.3820163
## [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## [1,] -0.08745159 -0.1522884 0.1472700 0.1238035 0.1252287 0.01890721 -0.0210609
## [2,] 0.37287083 0.3746489 0.3650547 0.3615403 0.3621690 0.37026358 0.3688775
## [,38] [,39] [,40] [,41] [,42] [,43]
## [1,] -0.1155230 -0.1456761 -0.08990403 0.1702856 -0.07904259 -0.03827613
## [2,] 0.3694792 0.3749027 0.37133469 0.3633369 0.37064859 0.37117345
## [,44] [,45] [,46] [,47] [,48] [,49]
## [1,] -0.09987244 0.2204412 -0.3171397 -0.3178905 -0.01431892 -0.2815505
## [2,] 0.37319475 0.3607403 0.3789350 0.3797002 0.36865386 0.3774923
## [,50] [,51] [,52] [,53] [,54] [,55]
## [1,] -0.4136641 -0.2250118 -0.1510223 -0.2544632 -0.2506401 0.07683982
## [2,] 0.3830844 0.3743316 0.3742318 0.3770195 0.3744488 0.36732310
## [,56] [,57] [,58] [,59] [,60] [,61]
## [1,] 0.1053151 0.03356326 -0.2812103 -0.1129048 -0.1038253 -0.1781964
## [2,] 0.3626336 0.36944961 0.3784192 0.3705594 0.3720134 0.3758376
## [,62] [,63] [,64] [,65] [,66] [,67]
## [1,] -0.1727214 -0.1442376 -0.1882993 -0.3593518 -0.1370012 -0.0749819
## [2,] 0.3754058 0.3725344 0.3737185 0.3824291 0.3741462 0.3703229
## [,68] [,69] [,70] [,71] [,72] [,73]
## [1,] 0.1662639 -0.04608364 -0.2856891 -0.2059424 -0.2120842 -0.3755977
## [2,] 0.3615931 0.37270289 0.3781760 0.3766609 0.3754913 0.3815586
## [,74] [,75] [,76] [,77] [,78] [,79]
## [1,] -0.01288918 -0.09190948 -0.2005001 0.3109164 -0.2524578 -0.003681623
## [2,] 0.37187185 0.37327463 0.3773014 0.3603351 0.3762848 0.368433917
## [,80] [,81] [,82] [,83] [,84] [,85] [,86]
## [1,] 0.08591486 0.4047038 -0.2133811 -0.4916114 -0.2358855 0.03338947 0.1622796
## [2,] 0.36474713 0.3534836 0.3743118 0.3849025 0.3771584 0.36934215 0.3642336
## [,87] [,88] [,89] [,90] [,91] [,92]
## [1,] -0.1419180 -0.3171961 -0.1681650 0.01431316 -0.2898806 -0.2224035
## [2,] 0.3729683 0.3801317 0.3756406 0.36987235 0.3776611 0.3771868
## [,93] [,94] [,95] [,96] [,97] [,98] [,99]
## [1,] -0.3433555 0.2360705 0.2148188 -0.2529450 0.06088361 -0.3067040 -0.4352128
## [2,] 0.3807841 0.3596056 0.3623019 0.3785206 0.36864715 0.3778703 0.3815922
## [,100] [,101] [,102] [,103] [,104] [,105]
## [1,] -0.1325351 -0.2046178 -0.06772322 -0.2117644 -0.3376841 -0.1033122
## [2,] 0.3734733 0.3779295 0.37189388 0.3757287 0.3792521 0.3739601
## [,106] [,107] [,108] [,109] [,110] [,111] [,112]
## [1,] 0.0769378 0.1059179 0.1947972 -0.2449660 -0.1153896 -0.04060904 -0.1818336
## [2,] 0.3654738 0.3669622 0.3643204 0.3791011 0.3736637 0.37044167 0.3764744
## [,113] [,114] [,115] [,116] [,117] [,118]
## [1,] 0.008781557 -0.2467120 -0.1145895 0.2929579 -0.3291352 0.5133065
## [2,] 0.369541895 0.3785016 0.3708676 0.3624540 0.3782801 0.3506718
## [,119] [,120] [,121] [,122] [,123] [,124]
## [1,] -0.6024835 -0.1529919 0.2249507 -0.1456046 0.01624928 -0.3181497
## [2,] 0.3882913 0.3722485 0.3627086 0.3727061 0.36736049 0.3795371
## [,125] [,126] [,127] [,128] [,129] [,130]
## [1,] 0.03379285 -0.06811182 -0.1388818 0.03001812 -0.1774813 -0.1298345
## [2,] 0.36394374 0.36942098 0.3725475 0.36939298 0.3747045 0.3739164
## [,131] [,132] [,133] [,134] [,135] [,136]
## [1,] -0.1240643 -0.1582547 -0.03761552 -0.1796160 0.01266891 -0.05192589
## [2,] 0.3727764 0.3750266 0.37049218 0.3783679 0.36593958 0.37079948
## [,137] [,138] [,139] [,140] [,141] [,142]
## [1,] 0.08162364 -0.1357124 -0.2421760 -0.1016078 -0.06465409 -0.2514302
## [2,] 0.36794596 0.3726387 0.3770183 0.3721435 0.36905861 0.3772411
## [,143] [,144] [,145] [,146] [,147] [,148] [,149]
## [1,] -0.3302042 0.2473171 -0.1024475 0.1434841 0.02163621 -0.157790 -0.1148461
## [2,] 0.3795141 0.3596508 0.3731115 0.3637962 0.36790269 0.373969 0.3736895
## [,150] [,151] [,152] [,153] [,154] [,155]
## [1,] -0.1753254 -0.06242418 -0.250281 -0.2306125 -0.06981966 -0.1174636
## [2,] 0.3752541 0.37023372 0.376001 0.3781700 0.37201676 0.3726591
## [,156] [,157] [,158] [,159] [,160] [,161]
## [1,] -0.1850750 0.002241244 -0.3778804 0.1463786 -0.2303706 0.1452589
## [2,] 0.3764922 0.369665592 0.3792280 0.3659399 0.3779399 0.3654494
## [,162] [,163] [,164] [,165] [,166] [,167]
## [1,] -0.2960610 0.08425598 -0.2702203 -0.1385690 -0.1974941 -0.1672543
## [2,] 0.3786561 0.37083406 0.3792295 0.3735312 0.3756555 0.3745102
## [,168] [,169] [,170] [,171] [,172] [,173]
## [1,] 0.09698716 -0.1272906 -0.03884571 -0.04292026 -0.2205603 -0.2042800
## [2,] 0.36594855 0.3719951 0.37178885 0.36989178 0.3764357 0.3744136
## [,174] [,175] [,176] [,177] [,178] [,179]
## [1,] -0.2450589 -0.2796055 -0.1070161 -0.1146885 -0.1036239 0.2709764
## [2,] 0.3787473 0.3787128 0.3721641 0.3731860 0.3704983 0.3603681
## [,180] [,181] [,182] [,183] [,184] [,185]
## [1,] -0.3113160 -0.01177979 0.1823580 0.2448249 -0.3279314 -0.4479079
## [2,] 0.3802109 0.37244157 0.3631711 0.3599937 0.3808818 0.3838047
## [,186] [,187] [,188] [,189] [,190] [,191]
## [1,] -0.07375241 -0.1883592 -0.01445596 0.2419397 -0.08059026 -0.1190217
## [2,] 0.37260016 0.3750987 0.37228958 0.3603868 0.36988754 0.3727273
## [,192] [,193] [,194] [,195] [,196] [,197] [,198]
## [1,] -0.1120486 -0.1324020 0.1082877 -0.4708168 -0.0621807 0.2673257 -0.1500376
## [2,] 0.3723796 0.3747695 0.3650623 0.3838927 0.3691828 0.3593580 0.3727868
## [,199] [,200] [,201] [,202] [,203] [,204] [,205]
## [1,] -0.0533942 -0.1369307 -0.1051899 0.04329212 0.06462983 0.1202129 0.2132213
## [2,] 0.3709593 0.3735498 0.3739613 0.36614067 0.36576947 0.3634184 0.3582632
## [,206] [,207] [,208] [,209] [,210] [,211]
## [1,] -0.08467422 -0.1291740 0.2554928 -0.2763201 -0.03063125 -0.3429280
## [2,] 0.37145484 0.3734843 0.3610293 0.3780025 0.36957167 0.3778996
## [,212] [,213] [,214] [,215] [,216] [,217] [,218]
## [1,] -0.1940538 0.2040595 -0.2879113 -0.2430651 0.2609003 0.03502678 -0.1879673
## [2,] 0.3747191 0.3623445 0.3784098 0.3767312 0.3600893 0.36963214 0.3748069
## [,219] [,220] [,221] [,222] [,223] [,224] [,225]
## [1,] -0.1654463 0.3118834 0.1308958 0.1089022 -0.2251165 -0.0364295 -0.03731145
## [2,] 0.3743780 0.3588584 0.3662576 0.3650698 0.3737674 0.3703338 0.36789814
## [,226] [,227] [,228] [,229] [,230] [,231]
## [1,] -0.5190838 -0.08225747 -0.3813338 0.05427234 -0.1423484 0.06619925
## [2,] 0.3867216 0.37476397 0.3787418 0.36738354 0.3727972 0.36533719
## [,232] [,233] [,234] [,235] [,236] [,237]
## [1,] -0.4568890 -0.01922579 -0.1990010 -0.2650054 -0.09608303 -0.3597731
## [2,] 0.3833622 0.36872864 0.3737839 0.3788929 0.37294340 0.3821599
## [,238] [,239] [,240] [,241] [,242] [,243]
## [1,] -0.4393778 0.04975262 0.01260484 -0.3057756 -0.01270678 -0.1863680
## [2,] 0.3839626 0.36682558 0.36890682 0.3807048 0.37059284 0.3746265
## [,244] [,245] [,246] [,247] [,248] [,249]
## [1,] 0.08924567 -0.08940521 -0.3023639 -0.2052988 0.01159391 -0.3274017
## [2,] 0.36683230 0.37047085 0.3794736 0.3749182 0.36354002 0.3798253
## [,250] [,251] [,252] [,253] [,254] [,255]
## [1,] -0.07622352 -0.3148851 0.1372454 -0.08039572 -0.5986285 -0.2560291
## [2,] 0.37101506 0.3796713 0.3626057 0.36997893 0.3890378 0.3764397
## [,256] [,257] [,258] [,259] [,260] [,261]
## [1,] -0.05666978 -0.3224625 -0.09234922 -0.1161972 -0.3645353 -0.0767699
## [2,] 0.36868760 0.3788882 0.37224131 0.3720454 0.3806949 0.3702459
## [,262] [,263] [,264] [,265] [,266] [,267]
## [1,] -0.1005809 0.02044883 -0.1720756 -0.2557347 -0.07279276 0.2354310
## [2,] 0.3699732 0.36807877 0.3757501 0.3782284 0.37101768 0.3609313
## [,268] [,269] [,270] [,271] [,272] [,273] [,274]
## [1,] -0.3076162 0.03581881 0.03996188 -0.1498088 -0.139312 0.1682392 -0.3365907
## [2,] 0.3792713 0.36823339 0.36899263 0.3727201 0.376054 0.3623833 0.3789216
## [,275] [,276] [,277] [,278] [,279] [,280]
## [1,] -0.1370840 -0.1635726 -0.09696348 0.05198834 -0.1518654 -0.02906144
## [2,] 0.3739901 0.3739848 0.37182248 0.36742478 0.3717937 0.36892926
## [,281] [,282] [,283] [,284] [,285] [,286]
## [1,] -0.2450001 -0.3621333 0.2844793 -0.3419839 -0.2107944 -0.3787555
## [2,] 0.3750834 0.3820580 0.3609745 0.3801081 0.3737925 0.3826865
## [,287] [,288] [,289] [,290] [,291] [,292]
## [1,] -0.08613829 -0.0818870 0.06417902 -0.3836092 0.4169953 -0.1738766
## [2,] 0.37097463 0.3696697 0.36679189 0.3830546 0.3554479 0.3734527
## [,293] [,294] [,295] [,296] [,297] [,298]
## [1,] -0.0907793 -0.5421065 0.04957468 -0.2094686 -0.1269108 -0.6134875
## [2,] 0.3721510 0.3855608 0.36743322 0.3750593 0.3718680 0.3906838
## [,299] [,300] [,301] [,302] [,303] [,304]
## [1,] 0.08376712 -0.3766151 0.0253329 -0.1459151 -0.07652126 -0.1073754
## [2,] 0.36622774 0.3822073 0.3700356 0.3731080 0.37304300 0.3743579
## [,305] [,306] [,307] [,308] [,309] [,310]
## [1,] -0.05920611 -0.4471013 0.03135869 0.01458738 0.0369101 -0.05579343
## [2,] 0.37159791 0.3836717 0.37000635 0.36858100 0.3680410 0.37173457
## [,311] [,312] [,313] [,314] [,315] [,316]
## [1,] -0.1893853 -0.08658372 -0.1587587 0.1468991 -0.3325628 -0.3913811
## [2,] 0.3755551 0.37294675 0.3744133 0.3666093 0.3821305 0.3809802
## [,317] [,318] [,319] [,320] [,321] [,322]
## [1,] -0.1376021 0.1794173 -0.1050646 0.07579948 -0.1785377 -0.4230223
## [2,] 0.3730250 0.3635368 0.3725147 0.36876908 0.3743966 0.3836770
## [,323] [,324] [,325] [,326] [,327] [,328]
## [1,] -0.2174537 0.1387264 -0.4056558 -0.2236021 0.1539415 -0.5149880
## [2,] 0.3747804 0.3673821 0.3861636 0.3796437 0.3641627 0.3851015
## [,329] [,330] [,331] [,332] [,333] [,334]
## [1,] -0.04106834 0.0113481 -0.1579838 0.1985707 -0.02232346 -0.02080002
## [2,] 0.37050281 0.3700942 0.3754385 0.3624859 0.37181585 0.37368279
## [,335] [,336] [,337] [,338] [,339] [,340]
## [1,] -0.2390761 0.2742716 0.1908342 -0.09094124 -0.3370107 -0.0337979
## [2,] 0.3744758 0.3605618 0.3601297 0.37119681 0.3798416 0.3703386
## [,341] [,342] [,343] [,344] [,345] [,346]
## [1,] 0.01913269 0.09460888 0.1344759 -0.3009278 -0.2954002 -0.1686379
## [2,] 0.36827117 0.36454819 0.3674220 0.3791489 0.3780841 0.3750347
## [,347] [,348] [,349] [,350] [,351] [,352]
## [1,] -0.3138877 -0.1237848 -0.157764 -0.004946888 -0.1857870 -0.04588687
## [2,] 0.3813187 0.3734334 0.373893 0.370815199 0.3750669 0.37269056
## [,353] [,354] [,355] [,356] [,357] [,358]
## [1,] -0.1624049 -0.3724774 -0.1724397 -0.1401155 0.02016355 -0.1011054
## [2,] 0.3733254 0.3797795 0.3734541 0.3742726 0.36803011 0.3715463
## [,359] [,360] [,361] [,362] [,363] [,364]
## [1,] -0.2216434 0.1216956 -0.2310558 -0.3688492 -0.2097506 -0.004000026
## [2,] 0.3752909 0.3673340 0.3780666 0.3797451 0.3751659 0.369692757
## [,365] [,366] [,367] [,368] [,369] [,370]
## [1,] -0.04776402 -0.03461158 -0.04087996 -0.3991653 -0.01684248 -0.04233174
## [2,] 0.37049597 0.36934496 0.36939539 0.3844842 0.36847654 0.37357344
## [,371] [,372] [,373] [,374] [,375] [,376]
## [1,] 0.2148773 -0.2670082 0.1968462 -0.1322288 -0.1264419 -0.1050594
## [2,] 0.3653174 0.3800377 0.3604642 0.3737159 0.3729835 0.3729708
## [,377] [,378] [,379] [,380] [,381] [,382]
## [1,] -0.07104581 0.2237890 -0.3818473 -0.3369181 0.2178351 -0.3244527
## [2,] 0.37002117 0.3606811 0.3812183 0.3788425 0.3635664 0.3807280
## [,383] [,384] [,385] [,386] [,387] [,388]
## [1,] -0.03847784 0.1832655 -0.09072682 -0.3930446 -0.1319919 -0.01921118
## [2,] 0.36852607 0.3648326 0.37259339 0.3770924 0.3720948 0.37168799
## [,389] [,390] [,391] [,392] [,393] [,394]
## [1,] 0.05609107 0.1835828 0.2633293 -0.08218185 -0.2317203 -0.1931842
## [2,] 0.36703751 0.3606227 0.3620531 0.37254441 0.3736395 0.3757126
## [,395] [,396] [,397] [,398] [,399] [,400]
## [1,] -0.01925103 -0.06657695 -0.2108233 -0.3390252 -0.03423483 -0.1402107
## [2,] 0.36944112 0.37289236 0.3754131 0.3814553 0.37008349 0.3732497
## [,401] [,402] [,403] [,404] [,405] [,406]
## [1,] -0.4886941 -0.09919996 -0.3203115 -0.08689107 -0.01583184 -0.5998494
## [2,] 0.3870136 0.37389608 0.3781961 0.37124197 0.36785917 0.3885344
## [,407] [,408] [,409] [,410] [,411] [,412]
## [1,] -0.1286696 -0.1649800 -0.07864133 0.3421002 -0.1428651 -0.04857933
## [2,] 0.3731741 0.3754177 0.37244351 0.3580278 0.3750941 0.37101961
## [,413] [,414] [,415] [,416] [,417] [,418]
## [1,] -0.0758769 -0.1544816 -0.3796079 0.01929865 -0.2013091 -0.4757715
## [2,] 0.3710651 0.3738932 0.3804897 0.36978957 0.3760883 0.3842777
## [,419] [,420] [,421] [,422] [,423] [,424]
## [1,] -0.08934291 -0.09714685 -0.1970010 -0.2649452 -0.2615710 -0.2277468
## [2,] 0.37195219 0.37118208 0.3739385 0.3782544 0.3767569 0.3786748
## [,425] [,426] [,427] [,428] [,429] [,430] [,431]
## [1,] -0.286866 -0.1302165 0.1662883 0.01978423 -0.0764468 -0.4454052 -0.1103693
## [2,] 0.378703 0.3728509 0.3628257 0.37027230 0.3737269 0.3863418 0.3747381
## [,432] [,433] [,434] [,435] [,436] [,437]
## [1,] -0.05787697 -0.2282570 -0.1101949 0.1151605 -0.0982731 -0.2390138
## [2,] 0.37173473 0.3758366 0.3728018 0.3642355 0.3729677 0.3778095
## [,438] [,439] [,440] [,441] [,442] [,443]
## [1,] -0.2359170 -0.2878570 -0.0008214773 -0.1999519 -0.3014317 -0.2052628
## [2,] 0.3784526 0.3811732 0.3702160342 0.3753222 0.3784968 0.3744326
## [,444] [,445] [,446] [,447] [,448] [,449]
## [1,] -0.2493862 -0.09044753 -0.4475614 -0.3619718 -0.02924033 -0.09691818
## [2,] 0.3755156 0.37277377 0.3838749 0.3810482 0.36887442 0.37287728
## [,450] [,451] [,452] [,453] [,454] [,455]
## [1,] -0.2800555 0.1911666 -0.07557055 -0.01677341 -0.2887878 -0.3246249
## [2,] 0.3776558 0.3635276 0.37317027 0.36898622 0.3817228 0.3796999
## [,456] [,457] [,458] [,459] [,460] [,461]
## [1,] -0.1889024 0.1462730 -0.06409476 -0.1870514 0.06731083 0.06771655
## [2,] 0.3751807 0.3641256 0.37139607 0.3739460 0.36727511 0.36766497
## [,462] [,463] [,464] [,465] [,466] [,467]
## [1,] -0.1198227 0.1169792 -0.01379596 -0.1591387 -0.09179612 -0.1878316
## [2,] 0.3733176 0.3639097 0.36913075 0.3747378 0.37317180 0.3750554
## [,468] [,469] [,470] [,471] [,472] [,473] [,474]
## [1,] 0.08353515 -0.2168703 -0.2095441 0.5569628 0.2445981 0.03225198 -0.1012703
## [2,] 0.36654148 0.3740016 0.3764124 0.3532075 0.3613565 0.36860574 0.3726416
## [,475] [,476] [,477] [,478] [,479] [,480]
## [1,] -0.06106426 -0.2554021 -0.1347401 0.006441018 0.01442969 -0.2071047
## [2,] 0.36948265 0.3782061 0.3739733 0.368925658 0.36920602 0.3742149
## [,481] [,482] [,483] [,484] [,485] [,486] [,487]
## [1,] -0.2884107 -0.1728562 0.1547526 -0.3102705 0.263810 0.2119325 -0.3063157
## [2,] 0.3780011 0.3726880 0.3654791 0.3796437 0.361154 0.3603181 0.3792346
## [,488] [,489] [,490] [,491] [,492] [,493]
## [1,] 0.0121210 -0.5614250 -0.3145520 0.05280185 -0.04565506 -0.01993817
## [2,] 0.3681752 0.3886775 0.3785218 0.36954257 0.37304481 0.36803104
## [,494] [,495] [,496] [,497] [,498] [,499]
## [1,] -0.2858513 0.09420807 -0.2336680 -0.2055985 -0.3049568 0.07963816
## [2,] 0.3809120 0.36709257 0.3746703 0.3774963 0.3811679 0.36545393
## [,500] [,501] [,502] [,503] [,504] [,505]
## [1,] -0.2883843 -0.05674144 -0.1774285 -0.1478562 -0.2935204 0.2233255
## [2,] 0.3772446 0.37313360 0.3750665 0.3767220 0.3813245 0.3605785
## [,506] [,507] [,508] [,509] [,510] [,511]
## [1,] -0.1804006 -0.04450607 -0.2998887 0.2218690 -0.1719975 -0.04171362
## [2,] 0.3760628 0.37150746 0.3784688 0.3624891 0.3764196 0.37176454
## [,512] [,513] [,514] [,515] [,516] [,517]
## [1,] 0.1898020 0.2238072 -0.1762671 -0.03421117 0.09724656 -0.2560481
## [2,] 0.3621399 0.3617458 0.3759718 0.37004629 0.36707138 0.3794311
## [,518] [,519] [,520] [,521] [,522] [,523]
## [1,] -0.3676861 0.005728077 -0.3967287 0.003966966 -0.3555276 -0.3472432
## [2,] 0.3818916 0.366200163 0.3838375 0.368485035 0.3806328 0.3817014
## [,524] [,525] [,526] [,527] [,528] [,529] [,530]
## [1,] 0.1169597 0.285846 0.1135758 -0.2113120 -0.2850437 0.1936552 -0.1856108
## [2,] 0.3643387 0.358832 0.3649925 0.3755613 0.3784754 0.3647632 0.3745020
## [,531] [,532] [,533] [,534] [,535] [,536]
## [1,] -0.1808059 0.05090086 -0.1524257 -0.1642312 -0.08931584 0.0277233
## [2,] 0.3761317 0.36635564 0.3743827 0.3712524 0.37043791 0.3678178
## [,537] [,538] [,539] [,540] [,541] [,542]
## [1,] 0.02056855 -0.1331622 0.1464592 0.1083481 -0.1896781 -0.02326434
## [2,] 0.36880599 0.3749768 0.3649479 0.3643381 0.3766322 0.36828912
## [,543] [,544] [,545] [,546] [,547] [,548]
## [1,] -0.1566308 -0.1668187 0.1485775 -0.02156156 -0.1723268 -0.4403763
## [2,] 0.3734295 0.3735510 0.3641198 0.36822387 0.3747705 0.3819151
## [,549] [,550] [,551] [,552] [,553] [,554]
## [1,] -0.1382565 -0.04075058 0.1161289 -0.3140391 0.02398682 -0.1905895
## [2,] 0.3753749 0.37147869 0.3648144 0.3809879 0.37049074 0.3742025
## [,555] [,556] [,557] [,558] [,559] [,560]
## [1,] -0.07130758 -0.4144822 0.2131345 0.03408248 -0.2406548 -0.3111497
## [2,] 0.37407988 0.3826338 0.3628725 0.36653683 0.3772095 0.3792505
## [,561] [,562] [,563] [,564] [,565] [,566]
## [1,] -0.1379791 -0.2005172 0.08187318 0.09697036 -0.007921702 0.1838605
## [2,] 0.3722231 0.3750557 0.36497299 0.36752144 0.368430735 0.3649502
## [,567] [,568] [,569] [,570] [,571] [,572]
## [1,] -0.364154 0.1015884 -0.2929507 -0.00114501 -0.006854127 -0.3845271
## [2,] 0.379571 0.3647122 0.3799863 0.36942780 0.368314347 0.3804841
## [,573] [,574] [,575] [,576] [,577] [,578]
## [1,] 0.1802551 0.1252089 -0.08581391 -0.3439447 -0.2039211 -0.07303347
## [2,] 0.3642427 0.3662850 0.37379179 0.3809772 0.3776440 0.37258899
## [,579] [,580] [,581] [,582] [,583] [,584]
## [1,] 0.1154942 -0.04397447 -0.1698997 0.1016407 -0.2193359 -0.07177716
## [2,] 0.3658361 0.36997385 0.3740544 0.3653294 0.3765581 0.36971293
## [,585] [,586] [,587] [,588] [,589] [,590] [,591]
## [1,] -0.3485947 -0.3829681 -0.2634945 0.1636874 0.1900403 -0.2348482 -0.4016433
## [2,] 0.3818426 0.3819105 0.3767101 0.3649484 0.3625576 0.3791558 0.3804095
## [,592] [,593] [,594] [,595] [,596] [,597]
## [1,] 0.2347770 -0.3212522 -0.006743518 -0.1008274 -0.02352037 -0.04745187
## [2,] 0.3634891 0.3814172 0.368799060 0.3741680 0.37098402 0.36820523
## [,598] [,599] [,600] [,601] [,602] [,603]
## [1,] -0.1209470 -0.2190040 -0.3954114 0.03641582 -0.1836549 -0.02629758
## [2,] 0.3720776 0.3770542 0.3821250 0.36654098 0.3740667 0.36877891
## [,604] [,605] [,606] [,607] [,608] [,609]
## [1,] -0.2832380 -0.06889913 -0.09497088 -0.03258941 -0.3663577 -0.1222368
## [2,] 0.3796084 0.37162435 0.36992841 0.37135501 0.3805831 0.3744265
## [,610] [,611] [,612] [,613] [,614] [,615]
## [1,] 0.01923891 0.1282454 -0.1241517 0.1272598 -0.2181499 -0.005263687
## [2,] 0.37076842 0.3635624 0.3727023 0.3654552 0.3742297 0.369451289
## [,616] [,617] [,618] [,619] [,620] [,621]
## [1,] -0.03319695 -0.05829558 -0.3205273 -0.2848412 -0.2020812 0.1984475
## [2,] 0.37290611 0.37135818 0.3810148 0.3815441 0.3764887 0.3609679
## [,622] [,623] [,624] [,625] [,626] [,627]
## [1,] -0.1608899 -0.1732397 0.1288484 -0.3687457 -0.2256736 -0.0366664
## [2,] 0.3736440 0.3747286 0.3649218 0.3822999 0.3766399 0.3707740
## [,628] [,629] [,630] [,631] [,632] [,633]
## [1,] 0.05643687 -0.4991106 -0.04834132 -0.1049340 -0.2844193 -0.4781076
## [2,] 0.36904954 0.3860706 0.37187300 0.3727066 0.3767136 0.3854052
## [,634] [,635] [,636] [,637] [,638] [,639]
## [1,] 0.06003602 -0.2098013 -0.1230284 -0.1563353 -0.1172862 0.08240791
## [2,] 0.36951460 0.3748617 0.3731278 0.3738437 0.3728287 0.36459887
## [,640] [,641] [,642] [,643] [,644] [,645]
## [1,] -0.2427880 0.04933951 -0.2071890 -0.004095928 -0.08753608 -0.1619646
## [2,] 0.3748439 0.37089535 0.3747884 0.370148049 0.37189931 0.3737777
## [,646] [,647] [,648] [,649] [,650] [,651]
## [1,] -0.4105376 -0.2273272 0.08415688 -0.2823533 -0.3196409 -0.1713661
## [2,] 0.3820743 0.3746620 0.36694653 0.3776507 0.3796642 0.3762097
## [,652] [,653] [,654] [,655] [,656] [,657]
## [1,] -0.04192064 -0.4834136 0.09161986 -0.1349118 0.0560842 -0.08510657
## [2,] 0.37275956 0.3840805 0.36419784 0.3758140 0.3688556 0.37259798
## [,658] [,659] [,660] [,661] [,662] [,663]
## [1,] 0.02497601 -0.2892110 0.0251814 -0.03043548 -0.3219577 -0.04941987
## [2,] 0.36802530 0.3789285 0.3688068 0.36894658 0.3778775 0.36954619
## [,664] [,665] [,666] [,667] [,668] [,669]
## [1,] -0.5657372 -0.07323013 0.05037536 -0.2304902 -0.1181620 -0.2877383
## [2,] 0.3877740 0.37274753 0.36832694 0.3743410 0.3753213 0.3768296
## [,670] [,671] [,672] [,673] [,674] [,675]
## [1,] 0.1761721 -0.3677608 0.1316485 -0.5871968 0.05149253 -0.09566308
## [2,] 0.3632107 0.3816675 0.3669960 0.3913570 0.36561459 0.37271632
## [,676] [,677] [,678] [,679] [,680] [,681]
## [1,] 0.04053439 -0.2071993 0.2175798 -0.2853213 0.08178606 -0.2405383
## [2,] 0.37077903 0.3757736 0.3618785 0.3800854 0.36671652 0.3770508
## [,682] [,683] [,684] [,685] [,686] [,687]
## [1,] -0.1091626 -0.1850398 0.1455943 0.09264011 0.0957455 -0.07197377
## [2,] 0.3733962 0.3751718 0.3644650 0.36694619 0.3632429 0.37225201
## [,688] [,689] [,690] [,691] [,692] [,693] [,694]
## [1,] -0.4429258 -0.1811913 -0.4351497 0.2658900 -0.5335963 -0.3289125 0.1372720
## [2,] 0.3848082 0.3774804 0.3842005 0.3612005 0.3856932 0.3794786 0.3658194
## [,695] [,696] [,697] [,698] [,699] [,700]
## [1,] 0.0002716102 0.05834558 -0.03584119 -0.3810595 -0.1301994 0.3184099
## [2,] 0.3702201209 0.36495854 0.36939404 0.3831894 0.3728202 0.3580694
## [,701] [,702] [,703] [,704] [,705] [,706]
## [1,] -0.2897338 -0.1445313 0.1269508 0.03831212 -0.007464817 -0.1417021
## [2,] 0.3770947 0.3771715 0.3652016 0.36890521 0.370688971 0.3748698
## [,707] [,708] [,709] [,710] [,711] [,712]
## [1,] 0.2299883 0.07917388 -0.2026520 -0.1311726 -0.09454471 -0.5020655
## [2,] 0.3628941 0.36382724 0.3765103 0.3730751 0.37257290 0.3873070
## [,713] [,714] [,715] [,716] [,717] [,718]
## [1,] -0.0913654 -0.2081776 -0.003244309 0.2566930 -0.2100404 -0.2725204
## [2,] 0.3724797 0.3749820 0.371455188 0.3594696 0.3769934 0.3759122
## [,719] [,720] [,721] [,722] [,723] [,724] [,725]
## [1,] -0.3932293 -0.2484994 0.2669428 0.3419232 -0.5137574 -0.1545898 -0.2282580
## [2,] 0.3799122 0.3767998 0.3582166 0.3576717 0.3849096 0.3737396 0.3784455
## [,726] [,727] [,728] [,729] [,730] [,731]
## [1,] 0.1382665 -0.2518236 -0.01098614 -0.4136209 -0.06675885 0.05084372
## [2,] 0.3642222 0.3775885 0.36934982 0.3832997 0.37221047 0.36816985
## [,732] [,733] [,734] [,735] [,736] [,737]
## [1,] -0.04659658 0.1158141 -0.1039990 -0.4486328 -0.3863047 -0.0498607
## [2,] 0.36966887 0.3660528 0.3714092 0.3861052 0.3822079 0.3705275
## [,738] [,739] [,740] [,741] [,742] [,743]
## [1,] -0.1504773 -0.2748487 -0.08705829 0.3111431 0.2170165 0.03527721
## [2,] 0.3729000 0.3768301 0.37222861 0.3574847 0.3627037 0.36711628
## [,744] [,745] [,746] [,747] [,748] [,749]
## [1,] -0.4499896 -0.2336318 -0.09742464 -0.1368241 -0.06583633 0.04844413
## [2,] 0.3834994 0.3760942 0.37374494 0.3728163 0.37284986 0.36817593
## [,750] [,751] [,752] [,753] [,754] [,755] [,756]
## [1,] -0.2648432 -0.2938352 0.4074211 0.1101523 0.1008779 -0.2827675 -0.1499602
## [2,] 0.3760189 0.3785756 0.3563896 0.3690359 0.3639685 0.3778447 0.3728329
## [,757] [,758] [,759] [,760] [,761] [,762]
## [1,] -0.07450132 -0.06226595 -0.1431719 0.1551875 -0.4138716 -0.4112839
## [2,] 0.36924657 0.37170544 0.3757131 0.3612258 0.3813923 0.3817403
## [,763] [,764] [,765] [,766] [,767] [,768]
## [1,] -0.4113657 -0.04644612 -0.2114258 -0.1066776 -0.1784335 -0.3662830
## [2,] 0.3815371 0.36982169 0.3772763 0.3721756 0.3751582 0.3820707
## [,769] [,770] [,771] [,772] [,773] [,774] [,775]
## [1,] -0.0480411 0.2864896 -0.1061170 0.03350971 0.1937146 0.4286421 0.07998915
## [2,] 0.3728702 0.3600483 0.3727912 0.36556965 0.3644977 0.3535854 0.36657581
## [,776] [,777] [,778] [,779] [,780] [,781]
## [1,] -0.04831127 -0.06964593 -0.08495646 -0.4127319 -0.02267591 -0.01357529
## [2,] 0.36873010 0.37292149 0.37184405 0.3831796 0.36989505 0.36622172
## [,782] [,783] [,784] [,785] [,786] [,787]
## [1,] -0.3494182 -0.1110815 -0.2449066 0.2611745 -0.2276887 -0.1758712
## [2,] 0.3800033 0.3729744 0.3743801 0.3589633 0.3781928 0.3747265
## [,788] [,789] [,790] [,791] [,792] [,793]
## [1,] -0.1334870 -0.6782349 0.09250175 -0.0898074 0.08125411 -0.3017651
## [2,] 0.3719437 0.3914617 0.36532206 0.3708395 0.36506482 0.3790834
## [,794] [,795] [,796] [,797] [,798] [,799]
## [1,] -0.2447477 -0.1178408 -0.1025789 -0.06734088 0.01897305 -0.2099462
## [2,] 0.3758811 0.3736651 0.3728445 0.36945337 0.36934840 0.3737393
## [,800] [,801] [,802] [,803] [,804] [,805]
## [1,] 0.02234273 0.06204514 -0.2880346 -0.1961433 0.02762689 -0.07820202
## [2,] 0.36692256 0.36900807 0.3792100 0.3741928 0.36979388 0.37255318
## [,806] [,807] [,808] [,809] [,810] [,811]
## [1,] -0.4429073 -0.006822478 -0.09665177 -0.2716740 -0.2207473 -0.4028242
## [2,] 0.3850621 0.366185411 0.37255063 0.3780883 0.3789110 0.3828601
## [,812] [,813] [,814] [,815] [,816] [,817]
## [1,] -0.1523097 -0.1631838 -0.2490495 -0.2885614 -0.2551454 -0.2654920
## [2,] 0.3780036 0.3732900 0.3761289 0.3809557 0.3791039 0.3780853
## [,818] [,819] [,820] [,821] [,822] [,823]
## [1,] 0.09831427 -0.2722028 -0.2264243 0.1630198 -0.4340619 -0.2559811
## [2,] 0.36436803 0.3778264 0.3764931 0.3661927 0.3843907 0.3767165
## [,824] [,825] [,826] [,827] [,828] [,829]
## [1,] -0.2155488 -0.2698937 -0.1393331 -0.0438217 0.1310064 -0.4269851
## [2,] 0.3759540 0.3781535 0.3725149 0.3689246 0.3653759 0.3838597
## [,830] [,831] [,832] [,833] [,834] [,835] [,836]
## [1,] -0.2268002 -0.4026863 -0.1187238 0.2186059 -0.1307797 0.1550991 -0.3653743
## [2,] 0.3758912 0.3825954 0.3729941 0.3647256 0.3736684 0.3637975 0.3806287
## [,837] [,838] [,839] [,840] [,841] [,842]
## [1,] -0.5379527 -0.01940113 -0.4698551 0.2126521 -0.1918600 -0.2795880
## [2,] 0.3839791 0.36764991 0.3833499 0.3657136 0.3745655 0.3758467
## [,843] [,844] [,845] [,846] [,847] [,848]
## [1,] -0.2125107 -0.2303035 -0.1732157 -0.4526832 -0.1507396 -0.2317038
## [2,] 0.3783387 0.3756291 0.3728118 0.3834450 0.3739082 0.3786407
## [,849] [,850] [,851] [,852] [,853] [,854]
## [1,] 0.06129409 0.1105313 -0.1870993 -0.3726746 -0.1940901 -0.3197105
## [2,] 0.36502801 0.3653439 0.3733063 0.3799559 0.3757917 0.3808070
## [,855] [,856] [,857] [,858] [,859] [,860]
## [1,] -0.4086225 0.02168187 -0.02062995 0.2623175 -0.1473902 -0.2282151
## [2,] 0.3827024 0.36526897 0.37138223 0.3638477 0.3742192 0.3774229
## [,861] [,862] [,863] [,864] [,865] [,866]
## [1,] -0.1395268 -0.1259480 -0.2065607 -0.3230438 -0.1389194 0.07203052
## [2,] 0.3735505 0.3733019 0.3800760 0.3820517 0.3752251 0.36730331
## [,867] [,868] [,869] [,870] [,871] [,872]
## [1,] -0.06484378 -0.1496253 0.03806259 -0.3294538 0.1344929 -0.1787147
## [2,] 0.37170448 0.3730774 0.36961841 0.3797783 0.3650753 0.3749411
## [,873] [,874] [,875] [,876] [,877] [,878]
## [1,] -0.2303365 -0.01298987 0.1382309 0.1022302 -0.2189159 -0.1860622
## [2,] 0.3777617 0.36707908 0.3654436 0.3667322 0.3768405 0.3747101
## [,879] [,880] [,881] [,882] [,883] [,884]
## [1,] -0.2853726 -0.5930671 -0.1701600 0.05251358 0.1683174 -0.05721029
## [2,] 0.3794687 0.3918210 0.3765418 0.36870160 0.3650861 0.37160741
## [,885] [,886] [,887] [,888] [,889] [,890]
## [1,] -0.4599646 -0.005424481 0.04150602 -0.2842067 -0.1985066 -0.04696441
## [2,] 0.3841920 0.367068012 0.36913293 0.3786965 0.3744466 0.36846937
## [,891] [,892] [,893] [,894] [,895] [,896]
## [1,] -0.09518687 -0.1439844 -0.3170808 -0.1521710 -0.08000762 -0.3980705
## [2,] 0.37333163 0.3732964 0.3816289 0.3744289 0.37039145 0.3822278
## [,897] [,898] [,899] [,900] [,901] [,902]
## [1,] -0.3762167 0.1154678 -0.1507075 -0.04289047 -0.3674495 -0.05945264
## [2,] 0.3842590 0.3652755 0.3761631 0.36861042 0.3801249 0.37130189
## [,903] [,904] [,905] [,906] [,907] [,908]
## [1,] -0.4940154 -0.09331072 -0.06590909 0.02977919 -0.1757243 -0.08296935
## [2,] 0.3842975 0.36944750 0.36807053 0.36834128 0.3776038 0.37254340
## [,909] [,910] [,911] [,912] [,913] [,914]
## [1,] 0.0428296 -0.06862014 -0.3321358 -0.02104676 0.2530404 0.006337124
## [2,] 0.3671810 0.37126765 0.3793343 0.37062745 0.3590029 0.368704017
## [,915] [,916] [,917] [,918] [,919] [,920]
## [1,] 0.05873053 -0.0805584 -0.1179157 0.2040023 -0.02137533 0.03265604
## [2,] 0.36496969 0.3725268 0.3718873 0.3612314 0.37116906 0.36665286
## [,921] [,922] [,923] [,924] [,925] [,926]
## [1,] -0.2428233 -0.1816755 -0.01852948 -0.03620869 0.03087203 0.2121488
## [2,] 0.3773067 0.3744721 0.36873606 0.36920646 0.36543173 0.3673216
## [,927] [,928] [,929] [,930] [,931] [,932]
## [1,] 0.1179600 -0.3475350 -0.3291691 -0.03496614 -0.3395129 0.1608777
## [2,] 0.3656392 0.3820577 0.3809586 0.37347904 0.3805523 0.3637065
## [,933] [,934] [,935] [,936] [,937] [,938]
## [1,] -0.1068784 -0.1300541 -0.07232504 -0.3133406 -0.04503385 -0.07134781
## [2,] 0.3744055 0.3757259 0.37252451 0.3767250 0.37109841 0.37038079
## [,939] [,940] [,941] [,942] [,943] [,944]
## [1,] -0.1183899 -0.2810481 0.01565023 -0.02636767 -0.1222524 0.1056379
## [2,] 0.3734466 0.3759888 0.36931529 0.37258486 0.3751458 0.3654868
## [,945] [,946] [,947] [,948] [,949] [,950]
## [1,] -0.5169445 -0.05813048 -0.1418447 -0.05352817 -0.2461404 0.2404185
## [2,] 0.3884650 0.36908278 0.3778448 0.36736897 0.3778710 0.3619425
## [,951] [,952] [,953] [,954] [,955] [,956]
## [1,] -0.3233161 -0.1870662 -0.1371304 -0.3400127 -0.1016214 -0.3230985
## [2,] 0.3807696 0.3744280 0.3745121 0.3798336 0.3717297 0.3809377
## [,957] [,958] [,959] [,960] [,961] [,962]
## [1,] 0.02363833 -0.4133086 -0.5522021 0.2161869 -0.1962173 -0.3450996
## [2,] 0.36737883 0.3805931 0.3863318 0.3605692 0.3746617 0.3788272
## [,963] [,964] [,965] [,966] [,967] [,968]
## [1,] -0.01254946 -0.0001278675 -0.2994139 -0.1393167 -0.04825667 -0.3444030
## [2,] 0.36916842 0.3702172673 0.3816380 0.3723690 0.37063357 0.3839392
## [,969] [,970] [,971] [,972] [,973] [,974]
## [1,] -0.2839153 -0.02297595 -0.1956346 -0.2450225 0.03084905 -0.2960501
## [2,] 0.3801629 0.37042351 0.3751683 0.3753024 0.36767911 0.3772948
## [,975] [,976] [,977] [,978] [,979] [,980]
## [1,] -0.2785336 -0.1740822 -0.2685068 -0.2211296 -0.1889721 -0.08599525
## [2,] 0.3786772 0.3766533 0.3761042 0.3769382 0.3750943 0.37260954
## [,981] [,982] [,983] [,984] [,985] [,986]
## [1,] 0.08878259 -0.009688213 -0.06573913 -0.188888 0.1661216 -0.1283650
## [2,] 0.36724321 0.369476745 0.37308731 0.373389 0.3607186 0.3744036
## [,987] [,988] [,989] [,990] [,991] [,992]
## [1,] -0.1276283 -0.06653752 -0.2043230 -0.4796595 0.09670279 -0.1207061
## [2,] 0.3742310 0.37249751 0.3770049 0.3846604 0.36608339 0.3696519
## [,993] [,994] [,995] [,996] [,997] [,998]
## [1,] -0.1976467 -0.1803191 -0.01581422 -0.03930603 0.1159669 -0.1186339
## [2,] 0.3763579 0.3769838 0.37081877 0.37130717 0.3689722 0.3716011
## [,999] [,1000]
## [1,] -0.03644096 0.06827887
## [2,] 0.37066213 0.36642341
se_bootstrap_rep
## [1] 0.2774881
The standard error result here is 0.2740808, indicating that the modeled data is pretty close to the original data (closer to 0 = good). The coefficients in my original linear model were -0.1042465 for culmen_mm and 0.3725280 for tarsus_mm.
In their 2011 paper, Stanton-Geddes and Anderson assessed the role of a facultative mutualism between a legume and soil rhizobia in limiting the plant’s range. After publishing, they deposited their data at Dryad http://datadryad.org/resource/doi:10.5061/dryad.8566. As part of their lab experiment, they looked at a variety of plant properties after growing plants with rhizobia from different regions. We want to look at what happened in that experiment during the March 12th sampling event.
Load the data. Vizualize. Then, fit and evaluate a generalized linear model with your choice of error distribution looking at the effect of rhizobial region and plant height as measured on march 12 as predictors of # of leaves on march 12. Does your model meet assumptions? If not, refit with a different. Why did you chose this (or these) error distribution(s)?
library(readr)
setwd("/Users/nathanstrozewski/Downloads/Biological Data Analysis/Midterm/Data/doi_10.5061_dryad.8566__v1/")
rhizo_field <- read_csv("field_inoculation_expt_2009.csv")
## Rows: 300 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (19): site, region, pop, trt, e.leaves, july.height, july.leaves, num.he...
## dbl (5): soil.perN, pos, germ, july.stage, jul.browsed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
rhizo_green <- read_csv("greenhouse_inoculation_expt_2010.csv")
## Rows: 240 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): rhizobia, rhiz_region, plant_pop, date_planted
## dbl (12): rack, bench, bench_pos, rack_pos, in_leaf_num, in_height, leaf_mar...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(rhizo_field)
## spc_tbl_ [300 × 24] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ site : chr [1:300] "ACNW" "ACNW" "ACNW" "ACNW" ...
## $ region : chr [1:300] "beyond.N" "beyond.N" "beyond.N" "beyond.N" ...
## $ soil.perN : num [1:300] 0.128 0.128 0.128 0.128 0.128 ...
## $ pop : chr [1:300] "KZA" "KZA" "GCD" "GCD" ...
## $ trt : chr [1:300] "R" "R" "N" "R" ...
## $ pos : num [1:300] 1 2 3 4 5 6 7 8 9 10 ...
## $ germ : num [1:300] 0 0 1 1 0 1 1 1 0 0 ...
## $ e.leaves : chr [1:300] "." "." "2" "2" ...
## $ july.stage : num [1:300] 0 0 1 1 0 1 1 1 0 0 ...
## $ july.height : chr [1:300] "." "." "17" "." ...
## $ july.leaves : chr [1:300] "." "." "5" "1" ...
## $ num.herbivory : chr [1:300] "." "." "2" "0" ...
## $ num.disease : chr [1:300] "." "." "3" "0" ...
## $ jul.browsed : num [1:300] NA NA 0 1 NA 1 1 0 NA NA ...
## $ nodules : chr [1:300] "." "." "1" "." ...
## $ fall.stage : chr [1:300] "0" "0" "." "0" ...
## $ browsed_fall : chr [1:300] "." "." "." "." ...
## $ fall.height : chr [1:300] "." "." "." "." ...
## $ fall.branch : chr [1:300] "." "." "." "." ...
## $ fall.pods : chr [1:300] "." "." "." "." ...
## $ fall.leaves : chr [1:300] "." "." "." "." ...
## $ fall.herbivory: chr [1:300] "." "." "." "." ...
## $ fall.disease : chr [1:300] "." "." "." "." ...
## $ fall.notes : chr [1:300] "." "." "." "." ...
## - attr(*, "spec")=
## .. cols(
## .. site = col_character(),
## .. region = col_character(),
## .. soil.perN = col_double(),
## .. pop = col_character(),
## .. trt = col_character(),
## .. pos = col_double(),
## .. germ = col_double(),
## .. e.leaves = col_character(),
## .. july.stage = col_double(),
## .. july.height = col_character(),
## .. july.leaves = col_character(),
## .. num.herbivory = col_character(),
## .. num.disease = col_character(),
## .. jul.browsed = col_double(),
## .. nodules = col_character(),
## .. fall.stage = col_character(),
## .. browsed_fall = col_character(),
## .. fall.height = col_character(),
## .. fall.branch = col_character(),
## .. fall.pods = col_character(),
## .. fall.leaves = col_character(),
## .. fall.herbivory = col_character(),
## .. fall.disease = col_character(),
## .. fall.notes = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
summary(rhizo_field)
## site region soil.perN pop
## Length:300 Length:300 Min. :0.1275 Length:300
## Class :character Class :character 1st Qu.:0.1348 Class :character
## Mode :character Mode :character Median :0.2061 Mode :character
## Mean :0.1912
## 3rd Qu.:0.2382
## Max. :0.2496
##
## trt pos germ e.leaves
## Length:300 Min. : 1.00 Min. :0.0000 Length:300
## Class :character 1st Qu.:15.75 1st Qu.:0.0000 Class :character
## Mode :character Median :30.50 Median :0.0000 Mode :character
## Mean :30.50 Mean :0.3967
## 3rd Qu.:45.25 3rd Qu.:1.0000
## Max. :60.00 Max. :1.0000
##
## july.stage july.height july.leaves num.herbivory
## Min. :0.0000 Length:300 Length:300 Length:300
## 1st Qu.:0.0000 Class :character Class :character Class :character
## Median :0.0000 Mode :character Mode :character Mode :character
## Mean :0.3733
## 3rd Qu.:1.0000
## Max. :2.0000
##
## num.disease jul.browsed nodules fall.stage
## Length:300 Min. :0.0000 Length:300 Length:300
## Class :character 1st Qu.:0.0000 Class :character Class :character
## Mode :character Median :0.0000 Mode :character Mode :character
## Mean :0.4242
## 3rd Qu.:1.0000
## Max. :1.0000
## NA's :267
## browsed_fall fall.height fall.branch fall.pods
## Length:300 Length:300 Length:300 Length:300
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fall.leaves fall.herbivory fall.disease fall.notes
## Length:300 Length:300 Length:300 Length:300
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
str(rhizo_green)
## spc_tbl_ [240 × 16] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ rack : num [1:240] 1 1 1 1 1 1 1 1 2 2 ...
## $ bench : num [1:240] 1 1 1 1 1 1 1 1 1 1 ...
## $ bench_pos : num [1:240] 1 1 1 1 1 1 1 1 2 2 ...
## $ rack_pos : num [1:240] 1 2 3 4 5 6 7 8 1 2 ...
## $ rhizobia : chr [1:240] "rSCWRS" "rSCWRS" "rSCWRS" "rSCWRS" ...
## $ rhiz_region : chr [1:240] "edge" "edge" "edge" "edge" ...
## $ plant_pop : chr [1:240] "KZA" "GCD" "TYS" "CRA" ...
## $ date_planted: chr [1:240] "28-Jan-10" "28-Jan-10" "28-Jan-10" "31-Jan-10" ...
## $ in_leaf_num : num [1:240] 3 2 4 3 3 4 4 3 2 3 ...
## $ in_height : num [1:240] 6 1.5 7.5 5.5 5 6.5 7.5 5.5 5 4 ...
## $ leaf_mar12 : num [1:240] 7 1 9 10 8 10 7 8 8 7 ...
## $ height_mar12: num [1:240] 10.5 5 14 12 9.5 13.5 12.5 10.5 12.5 11 ...
## $ branch_mar12: num [1:240] 0 0 0 0 0 0 0 0 0 0 ...
## $ flower_mar23: num [1:240] 0 0 0 0 0 0 0 0 0 0 ...
## $ height_apr20: num [1:240] 31.5 0 29 37 32 26.5 34 27 33 31 ...
## $ nodule_count: num [1:240] 26 0 33 55 19 30 42 34 34 31 ...
## - attr(*, "spec")=
## .. cols(
## .. rack = col_double(),
## .. bench = col_double(),
## .. bench_pos = col_double(),
## .. rack_pos = col_double(),
## .. rhizobia = col_character(),
## .. rhiz_region = col_character(),
## .. plant_pop = col_character(),
## .. date_planted = col_character(),
## .. in_leaf_num = col_double(),
## .. in_height = col_double(),
## .. leaf_mar12 = col_double(),
## .. height_mar12 = col_double(),
## .. branch_mar12 = col_double(),
## .. flower_mar23 = col_double(),
## .. height_apr20 = col_double(),
## .. nodule_count = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(rhizo_green)
## rack bench bench_pos rack_pos rhizobia
## Min. : 1.0 Min. :1 Min. : 1.0 Min. :1.00 Length:240
## 1st Qu.: 8.0 1st Qu.:1 1st Qu.: 3.0 1st Qu.:2.75 Class :character
## Median :15.5 Median :2 Median : 5.5 Median :4.50 Mode :character
## Mean :15.5 Mean :2 Mean : 5.5 Mean :4.50
## 3rd Qu.:23.0 3rd Qu.:3 3rd Qu.: 8.0 3rd Qu.:6.25
## Max. :30.0 Max. :3 Max. :10.0 Max. :8.00
## rhiz_region plant_pop date_planted in_leaf_num
## Length:240 Length:240 Length:240 Min. :0.000
## Class :character Class :character Class :character 1st Qu.:2.000
## Mode :character Mode :character Mode :character Median :3.000
## Mean :3.146
## 3rd Qu.:4.000
## Max. :6.000
## in_height leaf_mar12 height_mar12 branch_mar12
## Min. : 1.000 Min. : 1.000 Min. : 3.00 Min. :0.0000
## 1st Qu.: 4.500 1st Qu.: 6.000 1st Qu.: 8.50 1st Qu.:0.0000
## Median : 6.000 Median : 8.000 Median :11.00 Median :0.0000
## Mean : 5.775 Mean : 7.854 Mean :11.21 Mean :0.2292
## 3rd Qu.: 7.000 3rd Qu.: 9.000 3rd Qu.:13.00 3rd Qu.:0.0000
## Max. :10.500 Max. :21.000 Max. :22.50 Max. :4.0000
## flower_mar23 height_apr20 nodule_count
## Min. :0.00000 Min. : 0.00 Min. : 0.00
## 1st Qu.:0.00000 1st Qu.:17.38 1st Qu.: 0.00
## Median :0.00000 Median :27.00 Median : 11.00
## Mean :0.04583 Mean :25.04 Mean : 32.55
## 3rd Qu.:0.00000 3rd Qu.:32.00 3rd Qu.: 51.25
## Max. :1.00000 Max. :46.00 Max. :241.00
rhizo_green$rhiz_region[rhizo_green$rhiz_region == "c"] <- "Control"
rhizo_green$rhiz_region[rhizo_green$rhiz_region == "edge"] <- "Edge"
rhizo_green$rhiz_region[rhizo_green$rhiz_region == "interior"] <- "Interior"
rhizo_green$rhiz_region[rhizo_green$rhiz_region == "beyond"] <- "Beyond"
library(ggplot2) # plotting
library(ggpubr) # gg details
rhizo_green_region_plot <- ggplot(data = rhizo_green,
mapping = aes(x = rhiz_region,
y = leaf_mar12,
fill = rhiz_region)) +
geom_violin() +
geom_point() +
geom_jitter() +
labs(title = "Influence of Region on Rhizobial Leaf Number",
subtitle = "On March 12",
x = "Region (Relative to current distribution)",
y = "Number of Leaves",
fill = "Region") + # plot leaf num by region
custom_theme() # make it presentable
rhizo_green_region_text <- paste("Figure 6: Number of leaves as measured on March 12",
"was plotted by region. The width of the violin",
"indicates the density of measurements at that value.",
"Data was obtained from Stanton-Geddes& Anderson (2011).")
rhizo_green_region_para <- ggparagraph(text = rhizo_green_region_text,
size = 10,
color = "black") # format description
rhizo_green_region_final <- ggarrange(rhizo_green_region_plot,
rhizo_green_region_para,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
rhizo_green_height_plot <- ggplot(data = rhizo_green,
mapping = aes(x = height_mar12,
y = leaf_mar12,
fill = rhiz_region)) +
geom_violin(alpha = 0.5) +
labs(title = "Influence of Height on Rhizobial Leaf Number",
subtitle = "On March 12",
x = "Height (cm)",
y = "Number of Leaves",
fill = "Region") +
custom_theme()
rhizo_height_region_text <- paste("Figure 7: Number of leaves as measured on March 12",
"was plotted by sample height. The width of the violin",
"indicates the density of measurements at that value.",
"Data was obtained from Stanton-Geddes& Anderson (2011).")
rhizo_height_region_para <- ggparagraph(text = rhizo_height_region_text,
size = 10,
color = "black") # format description
rhizo_height_region_final <- ggarrange(rhizo_green_height_plot,
rhizo_height_region_para,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
## Warning: position_dodge requires non-overlapping x intervals
rhizo_plot <- ggplot(data = rhizo_green,
mapping = aes(x = height_mar12,
y = leaf_mar12,
color = rhiz_region)) +
geom_point(alpha = 0.5) +
facet_wrap(vars(rhiz_region)) +
stat_smooth() +
labs(title = "Influence of Height on Rhizobial Leaf Number",
subtitle = "On March 12",
x = "Height (cm)",
y = "Number of Leaves",
fill = "Region") +
custom_theme()
rhizo_text <- paste("Figure 8: Number of leaves as measured on March 12",
"was plotted by sample height and displayed by",
"region. Data was obtained from Stanton-Geddes",
"& Anderson (2011).")
rhizo_para <- ggparagraph(text = rhizo_height_region_text,
size = 10,
color = "black") # format description
rhizo_final <- ggarrange(rhizo_plot,
rhizo_para,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
rhizo_green_region_final
rhizo_height_region_final
rhizo_final
rhizo_glm <- glm(leaf_mar12 ~ rhiz_region + height_mar12,
family = poisson(link = "log"),
data = rhizo_green)
library(performance) # checking assumptions
check_model(rhizo_glm)
I used a Poisson distribution to write this generalized linear model (glm). In my glm, the dependent variable is the number of leaves counted on a sample (var_name = leaf_mar12). I initially interpreted this to be a continuous variable because the number of leaves can theoretically be any bounded value from 0 to infinity.
This is the code block I wrote when I did that:
> rhizo_glm <- glm(leaf_mar12 ~ rhiz_region + height_mar12,
family = Gamma(link = "log"),
data = rhizo_green) <
When I checked the assumptions of this model, everything looked excellent: the predicted data was almost exactly the same as the actual data, collinearity was low, etc.
However, I reevaluated my methods because I am a good scientist. I realized that my interpretation of the dependent variable was incorrect. The outcome, number of leaves, is actually a discrete variable and not continuous. For one, number of leaves counted can only be a whole number. Thus, the dependent variable values will always fit into discrete “bins”, such as 1 leaf, 2 leaves, etc. I rewrote my glm to reflect this.
When I checked the assumptions of this model (see {r 5a_glm} for model code), the posterior predictive check did not match the actual data as closely as the previous model. However, the match was still very close and the general trends aligned very closely. Despite not being as perfect a fit as the previous model, I am comfortable with the outcome here. There are no other apparent issues in the assumption checks, but it can’t hurt to look at the residuals to be sure.
library(DHARMa) # checking residuals
## This is DHARMa 0.4.6. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
rhizo_residuals <- simulateResiduals(rhizo_glm)
plotQQunif(rhizo_residuals)
plotResiduals(rhizo_residuals)
testOverdispersion(rhizo_residuals)
## testOverdispersion is deprecated, switch your code to using the testDispersion function
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.71429, p-value < 2.2e-16
## alternative hypothesis: two.sided
Digging into the residuals further does indicate that there is an issue with overdispersion. I tested using a quasi-poisson family (instead of poisson) in my glm, and this revealed no substantial changes to the check_model() output. Unfortunately, I can’t run the quasi-poisson version of my model through the DHARMa package and struggled to find a resolution. Because the PPC is so good in my model, I think it is ok to move on. I’ll have to put in a disclaimer about this overdispersion when I publish figures from this model in Nature soon.
### 5b: Evaluate your treatments
Which rhizobial regions enable more leaves relative to the control? And in what direction?
library(emmeans) # to do cool stats stuffs
library(ggplot2) # plotting
rhizo_em <- emmeans(rhizo_glm, specs = ~ rhiz_region)
rhizo_em_dunnet <- contrast(rhizo_em,
method = "dunnett",
ref = "Control") %>%
confint()
rhizo_em_dunnet_plot <- plot(rhizo_em_dunnet,
# main = "Influence of Rhizobial Region on Leaf Number", # doesn't work
# sub = "On March 12", # doesn't work
xlab = "Estimate",
ylab = "Dunnett Test Comparison") +
geom_vline(xintercept = 0,
color = "magenta",
lty = 16) +
custom_theme()
rhizo_em_dunnet
## contrast estimate SE df asymp.LCL asymp.UCL
## Beyond - Control 0.269 0.0827 Inf 0.0739 0.464
## Edge - Control 0.196 0.0760 Inf 0.0163 0.375
## Interior - Control 0.198 0.0735 Inf 0.0246 0.372
##
## Results are given on the log (not the response) scale.
## Confidence level used: 0.95
## Conf-level adjustment: dunnettx method for 3 estimates
rhizo_em_dunnet_plot
Figure 9 (above; I couldn’t fit a figure description to this one) demonstrates that all regions lead to an increase in leaf number as compared to control, with the beyond region demonstrating the largest increase. The influence of interior and edge region is very similar.
So, your distribution has quantiles (right? We see these in QQ plots). You can see what the value for those quantiles are from the q* function for a distribution. For example, the 95th percentile of a poisson with a lambda of 5 spans from 1 to 10. You can see this with qpois(0.025, lambda = 5) for the lower, and change it to 0.975 for the upper. Check this out. Plot the upper and lower bounds of the 95th percentile of a Poisson distribution over a range of lambdas from 1 to 30. Do the same for a negative binomial with means from 1 to 30. Note, to translate from a mean ( ) with a size of 10 (you might have to look at the helpfile for qnbinom here).
library(dplyr) # data manipulation
poisson_tibble <- tibble(x = rep(seq(0.025, 0.975, by = 0.05), 30),
lambda = rep(1:30,
each = length(x)/30),
dens = qpois(x, lambda = lambda))
poisson_tibble <- poisson_tibble %>%
filter(x == 0.025 | x == 0.975) %>%
arrange(lambda)
poisson_plot <- ggplot(data = poisson_tibble,
mapping = aes(x = lambda,
y = dens)) +
geom_line(mapping = aes(group = lambda)) +
geom_point(mapping = aes(color = as.factor(x))) +
labs(title = "Bounds Between the 95th Percentile",
subtitle = "of a Poisson Distribution Over Lambda of 1:30",
x = "Lambda",
y = "Density",
color = "Bounds") +
custom_theme() +
scale_color_manual(breaks = c("0.025", "0.975"),
values=c("blue", "salmon"))
qpois_text <- paste("Figure 10: The bounds between the 95th percentile of a",
"Poisson distribution were calculated and plotted over.",
"a lambda of 1:30. Data was obtained from Stanton-Geddes",
"& Anderson (2011).")
qpois_para <- ggparagraph(text = qpois_text,
size = 10,
color = "black") # format description
poisson_plot_final <- ggarrange(poisson_plot,
qpois_para,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
poisson_plot_final
qbinom_tibble <- tibble(x = rep(seq(0.025, 0.975, by = 0.05), 30),
mu = rep(1:30,
each = length(x)/30),
size = 10,
prob = size/(size + mu),
dens = qbinom(x, size = 10, prob = prob))
qbinom_tibble <- qbinom_tibble %>%
filter(x == 0.025 | x == 0.975) %>%
arrange(mu)
qbinom_plot <- ggplot(data = qbinom_tibble,
mapping = aes(x = mu,
y = dens)) +
geom_line(mapping = aes(group = mu)) +
geom_point(mapping = aes(color = as.factor(x))) +
labs(title = "Bounds Between the 95th Percentile",
subtitle = "of a Negative Binomial Distribution Over a Mu of 1:30",
x = "Mu",
y = "Density",
color = "Bounds") +
custom_theme() +
scale_color_manual(breaks = c("0.025", "0.975"),
values=c("blue", "salmon"))
qbinom_text <- paste("Figure 11: The bounds between the 95th percentile of a",
"negative binomial distribution were calculated and",
"plotted over a mu of 1:30. Data was obtained from",
"Stanton-Geddes & Anderson (2011).")
qbinom_para <- ggparagraph(text = qbinom_text,
size = 10,
color = "black") # format description
qbinom_final <- ggarrange(qbinom_plot,
qbinom_para,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
qbinom_final
All right! Armed with this knowledge, one of the frustrating things about broom::augment() for glm models is that it doesn’t have an interval argument. And has one trick. One - you need to see what scale your answer is returned on. We want to look at the response scale - the scale of the data. Second, while you can use se_fit to get standard errors, you don’t get a CI per se (although, hey, ~2(se) = CI).
AND - we just looked at how when you have an estimated value, you can get the prediction CI yourself by hand in the previous part of the question. So, using your creativity, visualize the fit, 95% fit interval, and 95% prediction interval at the min, mean, and max of height for each treatment. Geoms can be anything you like! Have fun here!
library(dplyr) # data manipulation
library(tidyr) # pivot
library(broom) # augment()
# calc min mean max of each group
# seq out values of height for each group
# predict based on seq values
# use se output from precict() to calculate CI
# plug fit into qpois with range of min-max to calculate
rhizo_stats <- rhizo_green %>%
group_by(rhiz_region) %>%
summarise(mean = mean(height_mar12),
min = min(height_mar12),
max = max(height_mar12)) %>%
pivot_longer(cols = -rhiz_region,
names_to = "stat",
values_to = "height_mar12")
rhizo_prediction <- augment(rhizo_glm,
newdata = rhizo_stats,
type.predict = "response",
se = TRUE) %>%
mutate(lower_CI = (.fitted - 2*.se.fit),
upper_CI = (.fitted + 2*.se.fit),
lower_PI = qpois(0.025, lambda = .fitted),
upper_PI = qpois(0.975, lambda = .fitted))
rhizo_prediction
## # A tibble: 12 × 9
## rhiz_region stat height_ma…¹ .fitted .se.fit lower…² upper…³ lower…⁴ upper…⁵
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Beyond mean 11.3 8.49 0.459 7.57 9.41 3 15
## 2 Beyond min 3 5.73 0.456 4.82 6.64 2 11
## 3 Beyond max 21.5 13.8 1.16 11.5 16.1 7 22
## 4 Control mean 10.6 6.29 0.395 5.50 7.08 2 12
## 5 Control min 4.5 4.70 0.361 3.98 5.42 1 9
## 6 Control max 19 9.36 0.761 7.84 10.9 4 16
## 7 Edge mean 10.1 7.46 0.321 6.82 8.10 3 13
## 8 Edge min 4 5.58 0.341 4.90 6.26 2 11
## 9 Edge max 19 11.4 0.804 9.78 13.0 5 18
## 10 Interior mean 12.3 8.32 0.307 7.70 8.93 3 14
## 11 Interior min 4.5 5.73 0.385 4.96 6.50 2 11
## 12 Interior max 22.5 13.5 0.985 11.5 15.5 7 21
## # … with abbreviated variable names ¹height_mar12, ²lower_CI, ³upper_CI,
## # ⁴lower_PI, ⁵upper_PI
rhizo_prediction_plot <- ggplot(data = rhizo_prediction,
mapping = aes(x = stat,
y = .fitted)) +
geom_point(color = "black",
shape = 11) +
geom_linerange(mapping = aes(ymin = lower_PI,
ymax = upper_PI,
color = "Prediction Interval"),
width = 0.5) +
geom_linerange(mapping = aes(ymin = lower_CI,
ymax = upper_CI,
color = "Confidence Interval"),
width = 1) +
facet_wrap(vars(rhiz_region)) +
labs(title = "Confidence and Prediction Intervals",
subtitle = "For the Leaf Number on March 12",
y = "Fitted Value",
x = "",
color = "Interval") +
custom_theme()
## Warning: Ignoring unknown parameters: width
## Ignoring unknown parameters: width
rhizo_pred_text <- paste("Figure 12: Confidence and prediction intervals for the",
"maximum, mean, and minimum leaf number from March 12.",
"Data was grouped by region. Data was obtained from",
"Stanton-Geddes & Anderson (2011).")
rhizo_pred_para <- ggparagraph(text = rhizo_pred_text,
size = 10,
color = "black") # format description
rhizo_pred_final <- ggarrange(rhizo_prediction_plot,
rhizo_pred_para,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
rhizo_pred_final
Again, totally optional, but, check out the sinterval package which you’d have to install from github. It uses fit models and it’s two core functions add_fitted_sims() and add_predicted_sims() to get simulated values from fit models using one or the other interval. Do this, and visualize the same thing as above however you’d like (maybe look at ggdist?). Or try something new? Perhaps visualize across a range of heights, and not just three?
library(sinterval) # new sims functions
library(ggdist) # plotting
library(dplyr) # data manipulation
rhizo_sinterval <- rhizo_green %>%
select(rhiz_region, height_mar12, leaf_mar12)
rhizo_stats_sinterval_fitted <- add_fitted_sims(newdata = rhizo_sinterval,
mod = rhizo_glm,
n_sims = 10) %>%
arrange(rhiz_region)
rhizo_stats_sinterval_predicted <- add_predicted_sims(newdata = rhizo_sinterval,
mod = rhizo_glm,
n_sims = 10) %>%
arrange(rhiz_region)
rhizo_sinterval_with_sims <- cbind(rhizo_stats_sinterval_fitted,
rhizo_stats_sinterval_predicted$leaf_mar12_predict,
rhizo_prediction$lower_CI,
rhizo_prediction$upper_CI) %>%
rename(leaf_mar12_predict = "rhizo_stats_sinterval_predicted$leaf_mar12_predict",
lower_CI = "rhizo_prediction$lower_CI",
upper_CI = "rhizo_prediction$upper_CI") %>%
select(rhiz_region, leaf_mar12_fit, leaf_mar12_predict, lower_CI, upper_CI) %>%
mutate(upper_PI = leaf_mar12_fit + leaf_mar12_predict,
lower_PI = leaf_mar12_fit - leaf_mar12_predict)
rhizo_sinterval_plot <- ggplot(data = rhizo_sinterval_with_sims,
mapping = aes(x = rhiz_region,
y = leaf_mar12_fit)) +
geom_point(color = "black") +
geom_linerange(mapping = aes(ymin = lower_PI,
ymax = upper_PI),
color = "darkgrey") +
geom_linerange(mapping = aes(ymin = lower_CI,
ymax = upper_CI),
color = "magenta") +
labs(title = "Prediction and Confidence Intervals For Leaf Number",
subtitle = "on March12 Using the sinterval Package",
y = "Fitted Value",
x = "Region") +
custom_theme()
rhizo_sinterval_text <- paste("Figure 13: Prediction and confidence intervals for",
"the leaf number on March 12. Data points were",
"grouped by region.All data were obtained from",
"Stanton-Geddes & Anderson (2011).")
rhizo_sinterval_para <- ggparagraph(text = rhizo_sinterval_text,
size = 10,
color = "black") # format description
rhizo_sinterval_final <- ggarrange(rhizo_sinterval_plot,
rhizo_sinterval_para,
ncol = 1,
nrow = 2,
heights = c(2.5, 1)) # combined figure and description
rhizo_sinterval_final
I was able to use the sinterval package to obtain fitted and predicted values. I am unsure how CIs should be obtained here - I just used the CIs from the previous problem. This probably isn’t right but I still wanted to try this problem out and include it.
Tired. This semester has been really busy. I’m taking three courses so that I can take fewer next semester (my last semester of classes, and might be TA-ing). Three midterms spanned over one month was pretty draining. What really bugs me is that I am having issues with a cell line that I need for an experiment. I’ve been working on this project for about a year and the end is always just out of reach. If I can finish this experiment in the next few weeks I’ll be on top of the world.
I’m looking forward to getting back into a routine now that midterms are over. This might have to wait until after finals, though. My hobbies are related to exercise and health, so getting back into a rhythm with them will help me feel 100% and perform 100%.
I’m getting pretty good at plotting. Conceptualizing what you want to plot to best demonstrate the meaning of a data set is a really important skill, and this class has given me ample practice.
I think I’ve also gotten pretty good at picking up a foreign data set and making sense of what’s going on. Sometimes this is hard to do when the terminology, data types, etc. are outside of your realm of expertise. I have a feeling it’s frowned up, but I really like renaming all the variables to something that makes sense to me prior to processing and analyzing.
I’m nowhere near mastery - but I have improved with functions DRASTICALLY since we first covered them. I’m actually the most motivated to improve here - the utility is nearly endless and can expedite a lot of things. I’ve formulated a couple ideas for functions I want to write to automate calculations I often do in my lab work. Hopefully I’ll get around to those sometime soon.
To be honest, not much. I have some background knowledge, but I find that I really need to sit down with new material and do practice problems. The labs have been unbelievably helpful for this. The readings are also really good. I think the next step for me is to keep practice - participating in Tidy Tuesdays, picking personal data analysis projects (more on this later), implementing into my lab work, etc.
As mentioned, I have some previous experience with MATLAB, working with signal processing and basic data analysis. I played aroud writing a couple basic scripts to analyze biometric data, but nothing complicated or impressive.
I was really excited to take this class because I’ve always wanted to be proficient in a coding language for use in data analysis. This is a skill that will always help me in my career. I’ve always been a bit intimidated to learn, though - it’s a lot! I’ve dabbled in free online classes and resources, but always found myself getting to busy to commit. I knew that taking this class would force me stick with it and commit to it.
This is genuinely my favorite class I’ve ever taken. I love being able to develop a technical skill while also being able to use that skill to learn how to interpret data, analyze, design studies, etc. It’s fun to learn about genetics and cell signaling pathways and other molecular stuff - but the uniqueness of this class and what I’m taking away from it every day is unparalleled. Jarrett’s passion and humor is a huge factor in what makes this entertaining. I could see different instructors making this really boring and monotonous. Hat’s off to Jarrett’s continued efforts.
I’ve mentioned a few already, but I’d like to talk about a passion project(s) I’ve always wanted to do. I’m a huge baseball nerd - I grew up playing obsessively and was even scouted by the New York Mets as a teenager (quick flex). I always regretted not pursuing it further but found passion in studying biology.
One thing that separates baseball from other sports is how analytically-driven it is (Moneyball is just the tip of the iceberg). Data is recorded for every single thing that happens on the baseball field - the angle of the ball from the pitchers hand, the number of times and direction it spins, the speed of the ball off the bat, the path of runners on the bases or in the field, etc. All of these numbers are used to inform how players approach situations, how they train to improve their performance, the in-game strategies that managers implement, the scouting and evaluation of players by team’s front offices, and the acquisition of players to improve a team.
A lot of this data is made public, and a huge community has developed at Fangraphs, where professionals and amateurs analyze available data to answer interesting baseball research questions. I have ALWAYS wanted to contribute to the Community Research page, but have never had the skill set. A lot of top contributors are full-time jobs at Fangraphs. And a lot of the top Fangraphs analysts have been hired by the Research & Development departments for Major League Baseball teams.
I have a lot of interesting questions and ideas for contributing to this community. Starting over the winter, I’m going to take a stab at it. Maybe my final project for this course will lead to my first contribution.
I need to work on writing functions more. I’ve grown a lot, but I want to get more comfortable with designing the steps to reach my goal of the function. And also I want to get better at nesting functions. This can confuse me sometimes and spin my mind in circles.
The biggest thing I want to improve upon is formulating a piecewise plan to reach an end-goal. Conceptualizing the intermediate steps and how to obtain each is a tough tasks sometimes. I think a reverse-engineering approach would be helpful. Getting better at this before starting to code will save me a lot of time, blood, sweat and tears.
I also want to better understand which statistical tests to use when. I have seen a lot of people use certain statistical tests in their work just because other people did or because someone told them to. I want to be able to look at my data, formulate a question, figure out what I need to do to answer that question - including what type of statistics is most appropriate, then execute.
I’ve done a lot. I’ve learned the syntax and prose of R. I’ve practiced conceptualizing a game plan and the steps to reach my end-goal. I’ve had a lot of practice executing my game plan, realizing my mistakes, going back to the drawing board to fix them, and repeating. I’ve had a lot of practice looking at strange data that uses terms and approaches that are foreign to me.
All of these things are transferable skills that won’t end with this class.
I haven’t done too much digging into which statistical analyses to use when. I feel like that’s coming in the final weeks.
This exam stretched my abilities quite a bit. The hardest part was determining what I needed to do, then formulating steps to do it. I also think I have a tendency to brute-force things, and I tried really hard to be more elegant here. I have a lot of work to do to become truly elegant.
For the most part, everything we did hear was related to something we have done in class, lab, or homework. I was able to use those resources to figure it out (I think - we’ll find out). Without those resources (and the Back Row dream team), I think I would have been very stressed taking this.
This exam was actually excellent practice for all the concepts we’ve covered and I feel much more comfortable with a lot of the material now.
I almost always say sufficient/strong when asked this question because I don’t quite believe that I’ve truly handled the concepts and tasks well, regardless of if I did.
I’m going to be more confident and say strong here. I worked my butt off on this and I think it paid off, for all of the reasons I’ve discussed throughout the meta questions. I took a lot of extra time to learn new skills here, such as theme design, advanced ggplot animations, combining plots to make figure descriptions, R markdown presentation, etc. I gave a couple good attempts at the Impress Yourself’s - and even had some success. I better understand my strengths and the things I need to work on.
I am also going to say strong here. Comparing myself from day 1 to today is ridiculous. I’ve learned so much and truly feel I understand A LOT of the material and technique. I think I’m probably at the top in terms of effort, and it’s really cool to see it pay off.
I’ve mentioned a couple - improving design and use of functions, making sweet gg animations (mostly combining multiples and cleaning them up), mastering my approach/game plan and how to execute it, and understanding which statistical test to use and when. I also want to get better at condensing my code into as few lines/functions/steps as possible. All of these things will help me work faster and smarter.
I think tackling the extra credit assignments will help me get extra practice. I’m also going to continue attempting all the Impress Yourself’s on homeworks, labs, etc. Once those run out, maybe I’ll take a stab at running some baseball sabermetric analyses. I think at this point, I need as much practice as possible.